remote
Data Engineer Public Trust Required - ICF
Data Engineer
Detail‑oriented Data Engineer building and optimizing large‑scale healthcare data pipelines for cancer registry and population‑based datasets, using Python, SQL, cloud services, and modern ETL tools.
About the role
Key Responsibilities
- Design, develop, and maintain scalable ETL pipelines that ingest, transform, and load high‑volume cancer registry data in NAACCR format.
- Implement data models and analytical datasets in cloud data warehouses (e.g., Snowflake) to support research, reporting, and policy analysis.
- Automate workflow orchestration using Apache Airflow or similar tools, ensuring reliable daily and batch processing.
- Collaborate with epidemiologists, analysts, and CDC partners to translate domain requirements into technical specifications.
- Monitor pipeline performance, troubleshoot data quality issues, and apply optimizations for cost‑effective cloud execution (AWS).
Requirements
- 3+ years of professional experience building data pipelines in Python and SQL.
- Hands‑on experience with cloud platforms (AWS) and data warehouse solutions such as Snowflake or Redshift.
- Proficiency in ETL/ELT design, data modeling, and workflow orchestration tools (Airflow, Prefect, etc.).
- Familiarity with healthcare data standards, especially NAACCR registry formats, is strongly preferred.
- Ability to obtain and maintain a Public Trust clearance.
Skills
pythonsqlawssnowflake