We are looking for a Data Engineer to build and maintain high‑performance data pipelines on AWS. The role focuses on Python, Spark, AWS Glue, Airflow, Apache Flink, and Hive to ingest, transform, and serve data for analytics and machine‑learning workloads.
Key Responsibilities
- Design, develop, and deploy scalable ETL pipelines using Spark and Python on AWS Glue.
- Orchestrate data workflows with Airflow, ensuring reliability and observability.
- Implement real‑time streaming solutions with Apache Flink and batch processing with Hive.
- Optimize job performance, monitor resource usage, and troubleshoot failures.
- Collaborate with data scientists and product teams to translate business requirements into technical solutions.
Requirements
- 3+ years of experience building data pipelines in a cloud environment.
- Strong proficiency in Python, Spark, and SQL.
- Hands‑on experience with AWS Glue, Airflow, Apache Flink, and Hive.
- Solid understanding of data modeling, partitioning, and performance tuning.
- Excellent problem‑solving skills and a proactive, collaborative mindset.