remote
Principal Research Data Engineer - Bayer
Data Engineer
Lead advanced data engineering initiatives, architecting scalable pipelines and data models to support research analytics. Leverage Python, Spark, and AWS to deliver high‑performance, reproducible data solutions for scientific discovery.
About the role
Key Responsibilities
- Design, develop, and maintain large‑scale data pipelines that ingest, transform, and serve research data across multiple domains.
- Collaborate with data scientists and domain experts to define data models, schemas, and metadata standards that enable reproducible research.
- Implement performance‑optimized solutions using Apache Spark, Python, and SQL on AWS infrastructure (EMR, Redshift, S3).
- Ensure data quality, lineage, and governance through automated testing, monitoring, and documentation.
- Mentor and guide junior engineers, fostering best practices in coding, version control, and DevOps.
Requirements
- 10+ years of experience in data engineering, with a strong focus on research or scientific data environments.
- Proficiency in Python, SQL, and Apache Spark for large‑scale data processing.
- Hands‑on experience deploying and managing data solutions on AWS (EMR, Redshift, S3, Glue).
- Deep understanding of data modeling, ETL design, and data governance principles.
- Excellent communication skills and a proven ability to translate complex technical concepts for cross‑functional teams.
Skills
pythonsqlapache sparkawsmachine learning