remote
Senior AI Data Pipeline Engineer Autonomous Driving - 42dot
Software Engineer
Lead the design and implementation of scalable AI data pipelines for autonomous driving, leveraging Python, Spark, and AWS to ingest, process, and serve high‑volume sensor data to downstream ML models.
About the role
Key Responsibilities
- Architect and develop end‑to‑end data pipelines that ingest raw vehicle sensor streams, perform real‑time preprocessing, and store processed data in cloud data lakes.
- Optimize Spark jobs and SQL queries for performance and cost efficiency on AWS EMR and Redshift.
- Collaborate with ML teams to expose clean, labeled datasets for training autonomous driving models.
- Implement CI/CD workflows using Docker and Kubernetes to deploy pipeline components with zero downtime.
- Monitor pipeline health, troubleshoot failures, and implement automated alerting and recovery mechanisms.
Requirements
- 5+ years of experience in data engineering, with a strong focus on large‑scale batch and streaming pipelines.
- Proficiency in Python, Apache Spark, and SQL; experience with AWS services (S3, EMR, Redshift, Glue).
- Hands‑on experience building containerized services and orchestrating them on Kubernetes.
- Solid understanding of data quality, lineage, and metadata management.
- Excellent problem‑solving skills and ability to work in a fast‑paced, cross‑functional team.
Skills
pythonapache sparkawsdockerkubernetessql