remote
Data Engineer II - Transportation Execution Speed Team - Amazon
Data Engineer
Data Engineer II focused on real‑time logistics, designing and operating scalable data pipelines on AWS, using Python, Spark, and Airflow to integrate heterogeneous sources and optimize delivery speed across a global transportation network.
About the role
Key Responsibilities
- Design, build, and maintain high‑throughput data pipelines that ingest, transform, and serve real‑time logistics data across multiple heterogeneous sources.
- Implement scalable ETL workflows using Python, Apache Spark, and AWS services (Glue, S3, Redshift, Athena).
- Collaborate with data scientists and product teams to define data models, schemas, and performance metrics for delivery speed optimization.
- Monitor pipeline health, troubleshoot failures, and continuously improve reliability and latency.
- Document architecture, data flows, and best practices for cross‑team knowledge sharing.
Requirements
- 3+ years of experience as a data engineer in a large, distributed environment.
- Proficiency in Python, SQL, and Spark for data processing and transformation.
- Hands‑on experience with AWS data services (Glue, Redshift, Athena, S3) and workflow orchestration (Airflow).
- Strong understanding of data modeling, schema design, and performance tuning.
- Excellent problem‑solving skills and ability to work independently in a fast‑paced, mission‑critical setting.
Skills
pythonsqlawsapache sparkairflow