onsite
Senior/Staff Machine Learning Engineer - Data Infrastructure
ML Engineer
Lead the design and deployment of scalable ML pipelines on data lake and warehouse platforms, orchestrating workflows with Apache Airflow and Spark while ensuring robust automated testing and production reliability.
About the role
Key Responsibilities
- Architect and implement end‑to‑end machine learning pipelines on large‑scale data lake and warehouse environments.
- Design and maintain Airflow DAGs to orchestrate data ingestion, feature engineering, model training, and deployment workflows.
- Leverage Apache Spark for distributed data processing and model training at scale.
- Develop and maintain automated testing suites (unit, integration, and performance) to guarantee pipeline reliability.
- Collaborate with data engineering and data science teams to optimize data pipelines and model performance.
- Document architecture, processes, and best practices for internal knowledge sharing.
Requirements
- 10+ years of experience in software engineering with a strong focus on machine learning and data infrastructure.
- Proficient in Python, Apache Airflow, Apache Spark, and SQL.
- Hands‑on experience building and scaling data lakes and data warehouses (e.g., Snowflake, BigQuery, Redshift).
- Deep understanding of automated testing frameworks and CI/CD pipelines for data workflows.
- Excellent communication skills and ability to mentor junior engineers.
Skills
machine learningapache spark