remoteonsite
Lead Data Engineer - Genpact
Data Engineer
Lead the design, development, and deployment of scalable data pipelines and AI solutions using Python, Spark, and AWS, driving enterprise‑wide data strategy and innovation.
About the role
Key Responsibilities
- Architect and implement end‑to‑end data pipelines that ingest, transform, and store large volumes of structured and unstructured data.
- Collaborate with data scientists and product teams to deploy machine learning models into production environments.
- Design and maintain data lake and warehouse solutions on AWS, ensuring high availability, security, and performance.
- Lead containerization and orchestration of data services using Docker and Kubernetes for scalable, resilient deployments.
- Mentor and guide a team of data engineers, fostering best practices in coding, testing, and documentation.
Requirements
- 5+ years of experience in data engineering with a strong focus on big data technologies.
- Proficiency in Python, SQL, and Apache Spark for data processing and transformation.
- Hands‑on experience with AWS services (S3, Redshift, Glue, EMR, Athena) and data lake architecture.
- Solid understanding of containerization (Docker) and orchestration (Kubernetes) for data workloads.
- Excellent communication skills and a proven ability to lead cross‑functional teams in a fast‑paced environment.
Skills
pythonsqlapache sparkawsdockerkubernetes