remoteonsite
Senior Databricks Engineer - Persistent Systems
Software Engineer
Lead end‑to‑end data engineering on Databricks, building scalable Spark pipelines, Delta Lake data lakes, and MLflow‑driven ML workflows in a cloud environment.
About the role
Key Responsibilities
- Design, develop, and maintain large‑scale Spark pipelines on Databricks, ensuring performance, reliability, and scalability.
- Implement Delta Lake best practices for ACID transactions, schema evolution, and data versioning.
- Integrate MLflow for experiment tracking, model registry, and deployment pipelines.
- Collaborate with data scientists and analysts to translate business requirements into efficient data solutions.
- Automate data workflows using CI/CD pipelines and orchestrate jobs across cloud platforms (AWS, Azure, or GCP).
- Monitor, troubleshoot, and optimize job performance, providing actionable insights to stakeholders.
Requirements
- 5+ years of experience in data engineering with a focus on Databricks and Spark.
- Experience with CI/CD tools (Git, Jenkins, Azure DevOps) and containerization (Docker, Kubernetes).
- Excellent problem‑solving skills and ability to communicate complex concepts to non‑technical audiences.
Skills
databricksapache sparkpythonmlflowawscicd