The Data Scientist at Georgetown University's CSET will develop AI-enabled solutions, build data pipelines, and conduct research design to analyze national security policy trends. This role requires strong skills in machine learning, NLP, and data visualization to communicate findings effectively.
About the role
About the Role
The Data Scientist will be part of the Center for Security and Emerging Technology (CSET) at Georgetown University. This role involves a blend of research, data engineering, and the development of AI-enabled solutions, focusing on national security policy trends.
Responsibilities
Build data pipelines to ingest and process large, structured and unstructured datasets.
Conduct research design and identify appropriate data methods.
Implement data extraction, classification, clustering, annotation, and entity resolution.
Improve dataset quality and utility for analytical purposes.
Design, evaluate, and implement modeling solutions.
Develop AI-enabled solutions.
Monitor AI, machine learning, biotechnology, and national security policy trends.
Interpret results and draw inferences from data analyses.
Create accessible visualizations and web interfaces to present findings.
Document and communicate methods and data resources to both technical and policy audiences.
Requirements
Proficiency in programming languages such as Python and database querying with SQL.
Experience with cloud platforms like AWS, Azure, or Google Cloud Platform.
Familiarity with data pipeline tools like Airflow.
Skills in Machine Learning, Natural Language Processing (NLP), Large Language Models, and Text processing.
Experience with data modeling and developing data-driven solutions.
A Bachelor of Science degree is required.
Ability to communicate complex findings to diverse audiences.
Skills
AWSAirflowAnnotationAzureClassificationCloud platformClusteringData ModelingData PipelinesEntity ResolutionGithubGoogle CloudGoogle Cloud PlatformLanguage ModelsLanguage ProcessingLarge Language ModelsMachine LearningNatural LanguageNatural Language ProcessingPythonSQLText Processing