remote
Senior Specialist, Data Engineer Upstream Biologics - Merck
Data Engineer
Senior Data Engineer leading design, development, and optimization of scalable data pipelines and analytics platforms for upstream biologics, leveraging Python, Spark, SQL, and AWS services.
About the role
Key Responsibilities
- Design, build, and maintain robust, high‑performance data pipelines that ingest, transform, and store large‑scale biologics research data.
- Develop and optimize Spark jobs and SQL queries to support downstream analytics, machine‑learning models, and reporting.
- Implement data‑modeling standards and data‑warehouse architectures on AWS (e.g., Redshift, S3, Glue) to ensure data integrity and accessibility.
- Collaborate with scientists, bio‑informaticians, and IT teams to translate research requirements into scalable data solutions.
- Automate ETL workflows, monitor performance, and troubleshoot production issues using Linux‑based tools and cloud monitoring services.
Requirements
- 5+ years of professional experience in data engineering, preferably in biopharma or life‑science environments.
- Strong proficiency in Python, SQL, and Apache Spark for large‑volume data processing.
- Hands‑on experience with AWS services (Redshift, S3, Glue, Lambda) and infrastructure‑as‑code concepts.
- Solid understanding of data modeling, schema design, and ETL best practices.
- Excellent problem‑solving skills, ability to work cross‑functionally, and effective communication of technical concepts to non‑technical stakeholders.
Skills
pythonsqlapache sparkawslinux