remote
Data Engineer - Stitch Fix Annex
Data Engineer
Data Engineer building scalable pipelines on AWS, transforming raw data into actionable insights using Python, Spark, and Airflow. Focus on data quality, performance, and collaboration with data science teams.
About the role
Key Responsibilities
- Design, develop, and maintain end‑to‑end data pipelines that ingest, transform, and load large volumes of structured and unstructured data into the data warehouse.
- Optimize Spark jobs and SQL queries for performance and cost efficiency on AWS services such as EMR, Redshift, and S3.
- Implement and manage Airflow DAGs to orchestrate data workflows, ensuring reliability and observability.
- Collaborate with data scientists and product teams to understand data requirements and deliver clean, well‑documented datasets.
- Monitor pipeline health, troubleshoot failures, and proactively address data quality issues.
Requirements
- 3+ years of experience as a Data Engineer or similar role, with strong proficiency in Python and SQL.
- Hands‑on experience with Apache Spark, AWS Glue, and data warehousing solutions.
- Solid understanding of data modeling, ETL best practices, and performance tuning.
- Experience orchestrating workflows with Airflow or equivalent scheduler.
- Excellent problem‑solving skills and ability to work collaboratively in a fast‑paced environment.
Skills
pythonsqlapache sparkawsairflow