onsite
Data Engineer - Ingestion & Pipelines - Programming.com
Data Engineer
Experienced Data Engineer specialized in building scalable ingestion pipelines and real‑time data flows using Python, SQL, Airflow, Kafka, Spark, and AWS services.
About the role
Key Responsibilities
- Design, develop, and maintain end‑to‑end data ingestion pipelines for large‑scale financial compliance data.
- Implement real‑time streaming solutions using Apache Kafka and batch processing with Apache Spark.
- Orchestrate workflows and schedule jobs with Apache Airflow, ensuring reliability and observability.
- Write performant data transformation scripts in Python and SQL, adhering to best practices for data quality and governance.
- Deploy and manage pipeline components on AWS (S3, Redshift, EMR, Lambda) and monitor resource utilization.
- Collaborate with data scientists, analysts, and product teams to translate business requirements into scalable data solutions.
Requirements
- 5+ years of professional experience building data pipelines and ETL processes.
- Strong proficiency in Python and SQL for data manipulation and automation.
- Hands‑on experience with Apache Airflow, Kafka, and Spark in production environments.
- Solid understanding of AWS data services (S3, Redshift, EMR, Lambda) and cloud‑native architecture.
- Demonstrated ability to work cross‑functionally, troubleshoot complex data issues, and optimize performance.
Skills
pythonsqlapache sparkaws