remoteonsite
Principal Consultant - Databricks Developer - Genpact
Software Engineer
Senior data engineering role focused on architecting, building, and deploying scalable Databricks solutions using Spark, Python, and Scala, with expertise in Delta Lake, MLflow, and AWS cloud services.
About the role
Key Responsibilities
- Design, develop, and maintain end‑to‑end data pipelines on Databricks, leveraging Spark, Delta Lake, and Python/Scala.
- Lead the implementation of MLflow for experiment tracking, model registry, and production deployment.
- Collaborate with data scientists and product teams to translate business requirements into scalable data solutions.
- Optimize performance and cost of Spark jobs on AWS infrastructure, including cluster sizing and resource tuning.
- Mentor junior engineers, conduct code reviews, and promote best practices in data engineering and DevOps.
Requirements
- 10+ years of data engineering experience with a strong focus on Databricks and Spark.
- Proficient in Python and Scala, with hands‑on experience building production pipelines.
- Deep knowledge of Delta Lake, MLflow, and AWS services (EMR, S3, Glue, Redshift).
- Experience leading technical teams and driving architectural decisions.
- Excellent communication skills and ability to translate complex technical concepts to non‑technical stakeholders.
Skills
databricksapache sparkpythonscalamlflowaws