onsite
Senior Data Engineer - Large Driving Model Autonomy - Rivian
Data Engineer
Senior Data Engineer responsible for designing and scaling data pipelines that feed large autonomous driving models, leveraging Python, Spark, SQL, and AWS to deliver reliable, high‑throughput data for machine‑learning workloads.
About the role
Key Responsibilities
- Design, build, and maintain robust, scalable data pipelines that ingest, transform, and store sensor and simulation data for autonomous driving models.
- Collaborate with ML scientists and software engineers to define data requirements and ensure data quality, latency, and availability.
- Implement data processing solutions using Apache Spark and SQL on AWS services such as S3, Redshift, and Glue.
- Develop monitoring, alerting, and automated testing frameworks to guarantee pipeline reliability and performance at scale.
- Optimize storage and compute costs while maintaining compliance with security and governance standards.
Requirements
- 5+ years of experience building large‑scale data pipelines in Python and Spark.
- Strong proficiency with SQL and relational/columnar data stores (e.g., Redshift, Snowflake, BigQuery).
- Hands‑on experience with AWS data services (S3, EMR, Glue, Lambda) and infrastructure‑as‑code tools.
- Familiarity with machine‑learning data workflows and versioning for autonomous vehicle datasets.
- Excellent problem‑solving skills and ability to work cross‑functionally in a fast‑paced, innovative environment.
Skills
pythonapache sparksqlaws