remoteonsite
Senior Data Engineer - Amgen
Data Engineer
Lead end‑to‑end data pipeline development, architect scalable data solutions on AWS, and optimize large‑scale ETL processes using Python and Spark to support advanced analytics and machine learning initiatives.
About the role
Key Responsibilities
- Design, build, and maintain robust data pipelines and workflows on AWS, ensuring high availability and performance for large datasets.
- Develop and optimize ETL processes using Python, SQL, and Apache Spark to ingest, transform, and load data from diverse sources.
- Collaborate with data scientists and analysts to understand data requirements and deliver clean, well‑documented datasets for modeling and reporting.
- Implement data quality checks, monitoring, and alerting to guarantee data integrity and reliability.
- Mentor junior engineers, conduct code reviews, and promote best practices in data engineering and DevOps.
Requirements
- 5+ years of experience in data engineering, with a strong background in Python, SQL, and Spark.
- Proven expertise in AWS services such as S3, Redshift, Glue, EMR, and Lambda.
- Solid understanding of data modeling, schema design, and performance tuning for large‑scale data warehouses.
- Experience with CI/CD pipelines, version control (Git), and containerization (Docker).
- Excellent problem‑solving skills and ability to work collaboratively in a fast‑paced environment.
Skills
pythonsqlawsapache spark