Lead the design, development, and maintenance of large‑scale data pipelines that ingest, transform, and expose data for analytics and machine‑learning workloads. Work closely with data scientists, product managers, and platform teams to translate business requirements into robust, reusable data services.
Key Responsibilities
- Architect and implement end‑to‑end ETL workflows using Python, Spark, and SQL on AWS services such as S3, Redshift, and Glue.
- Optimize data models and storage strategies to support real‑time and batch analytics at petabyte scale.
- Collaborate with cross‑functional teams to define data quality standards, monitoring, and alerting.
- Mentor junior engineers and champion best practices in code quality, testing, and CI/CD.
- Evaluate and adopt emerging data‑engineering technologies to improve performance and cost efficiency.
Requirements
- 8+ years of software engineering experience with a focus on data engineering.
- Proficiency in Python, SQL, and Apache Spark for large‑scale data processing.
- Hands‑on experience with AWS data services (Glue, Redshift, Athena, EMR).
- Strong understanding of data modeling, dimensional design, and ETL best practices.
- Excellent problem‑solving skills and a track record of delivering production‑grade solutions.