remote
Assistant Director, Data Engineering - University of Colorado Anschutz Medical Campus
Software Engineer
Lead data engineering initiatives, designing and maintaining scalable pipelines on AWS, leveraging Python, SQL, and Spark to transform and deliver high‑quality datasets for research and clinical analytics.
About the role
Key Responsibilities
- Architect, develop, and maintain robust data pipelines using Python, SQL, and Spark on AWS services (S3, Redshift, EMR).
- Collaborate with data scientists, researchers, and IT teams to define data requirements and ensure data quality and integrity.
- Implement ETL processes, automate workflows, and monitor pipeline performance using Airflow or similar orchestration tools.
- Optimize data storage and retrieval strategies, including partitioning, indexing, and compression techniques.
- Document data models, pipeline logic, and best practices for reproducibility and compliance.
Requirements
- 5+ years of experience in data engineering or related field.
- Proficiency in Python, SQL, and Spark for large‑scale data processing.
- Hands‑on experience with AWS data services (S3, Redshift, EMR, Glue).
- Strong understanding of ETL concepts, data warehousing, and data governance.
- Excellent problem‑solving skills and ability to work collaboratively in a multidisciplinary environment.