remote
ML Data Engineer - UST
Data Engineer
Design and operate scalable data pipelines for machine‑learning workloads, leveraging Python, Spark, Airflow and AWS services to deliver reliable, production‑grade data solutions.
About the role
Key Responsibilities
- Architect, develop, and maintain end‑to‑end data pipelines that feed machine‑learning models in production.
- Implement data ingestion, transformation, and validation using Python, SQL, and Apache Spark on cloud infrastructure.
- Orchestrate workflows with Apache Airflow, ensuring reliability, monitoring, and alerting for critical data jobs.
- Collaborate with data scientists and analytics teams to define data requirements and optimize feature engineering processes.
- Deploy containerized services with Docker (and optionally Kubernetes) on AWS, managing storage, compute, and security best practices.
Requirements
- 3+ years of hands‑on experience building data pipelines for ML or analytics workloads.
- Proficiency in Python, SQL, and Spark for large‑scale data processing.
- Strong knowledge of AWS services (S3, Redshift, EMR, Lambda, etc.) and infrastructure‑as‑code concepts.
- Experience with workflow orchestration tools such as Apache Airflow.
- Familiarity with containerization (Docker) and CI/CD pipelines for data engineering deployments.
Skills
pythonsqlapache sparkairflowawsmachine learningdocker