remoteonsite
Databricks Engineer - Persistent Systems
Software Engineer
Lead end‑to‑end data engineering on Databricks, building scalable Spark pipelines, Delta Lake data lakes, and MLflow‑driven ML workflows in a cloud environment.
About the role
Key Responsibilities
- Design, develop, and maintain large‑scale Spark pipelines on Databricks for batch and streaming data.
- Implement Delta Lake best practices for ACID transactions, schema evolution, and data versioning.
- Integrate MLflow for experiment tracking, model packaging, and deployment pipelines.
- Collaborate with data scientists to optimize feature engineering and model training workflows.
- Automate data workflows using CI/CD pipelines and monitor job performance and data quality.
Requirements
- 3+ years of experience building data pipelines with Databricks and Apache Spark.
- Strong proficiency in Python and SQL; experience with Scala or Java is a plus.
- Hands‑on experience with Delta Lake, MLflow, and cloud data services (AWS, Azure, or GCP).
- Solid understanding of data modeling, ETL best practices, and performance tuning.
- Excellent problem‑solving skills and ability to work collaboratively in a fast‑paced environment.
Skills
databricksapache sparkpythonmlflowaws