remote
Data Engineer - THG
Data Engineer
Build and maintain scalable data pipelines on AWS, transforming raw data into actionable insights using Python, SQL, and Spark. Drive data quality, performance, and automation for a global e‑commerce platform.
About the role
Key Responsibilities
- Design, develop, and maintain end‑to‑end data pipelines on AWS (Glue, Redshift, S3) to ingest, transform, and load large volumes of structured and semi‑structured data.
- Implement robust ETL processes using Python, SQL, and Apache Spark, ensuring data quality, consistency, and performance.
- Collaborate with data scientists, product managers, and business stakeholders to understand data requirements and deliver timely, high‑quality datasets.
- Monitor pipeline health, troubleshoot issues, and optimize performance through profiling, indexing, and query tuning.
- Document data models, pipeline logic, and best practices to support knowledge sharing and compliance.
Requirements
- 3+ years of experience as a Data Engineer or similar role in a fast‑paced environment.
- Strong proficiency in Python, SQL, and experience with Spark or similar distributed processing frameworks.
- Hands‑on experience with AWS services (Glue, Redshift, S3, Lambda, Athena).
- Solid understanding of data warehousing concepts, dimensional modeling, and ETL best practices.
- Excellent problem‑solving skills, attention to detail, and ability to communicate complex technical concepts to non‑technical stakeholders.
Skills
pythonsqlawsapache spark