onsite
Data Engineer - BHI Biohealth International GmbH
Data Engineer
Data Engineer responsible for designing, building, and maintaining scalable data pipelines and lakehouse architecture using Python, SQL, and AWS services, ensuring high data quality and performance for analytics and ML workloads.
About the role
Key Responsibilities
- Design, develop, and maintain robust data pipelines from diverse sources into a cloud data lake and warehouse using Python, SQL, and Spark.
- Implement data modeling, schema evolution, and partitioning strategies to optimize query performance and storage costs on AWS services such as S3, Redshift, and Athena.
- Collaborate with data scientists and analysts to provide clean, well‑documented datasets for machine learning and business intelligence projects.
- Monitor pipeline health, troubleshoot failures, and continuously improve reliability through automated testing and CI/CD practices.
- Ensure data governance, security, and compliance by applying encryption, access controls, and audit logging.
Requirements
- Proven experience as a Data Engineer or similar role, with strong Python and SQL skills.
- Hands‑on experience with AWS data services (S3, Redshift, Glue, Athena, EMR) and Spark/Databricks.
- Solid understanding of data modeling, ETL concepts, and performance tuning.
- Familiarity with CI/CD pipelines, version control (Git), and containerization (Docker).
- Excellent problem‑solving skills and ability to work collaboratively in a fast‑paced environment.