onsite
Data Engineer PySpark - IT First Source
Data Engineer
Seeking a Data Engineer to design, build, and maintain scalable data pipelines using PySpark, Python, and cloud services, enabling reliable data delivery for analytics and business intelligence.
About the role
Key Responsibilities
- Design, develop, and optimize end‑to‑end data pipelines using PySpark and Python to ingest, transform, and store large‑scale datasets.
- Implement and manage data workflows in Apache Airflow, ensuring reliable scheduling, monitoring, and error handling.
- Collaborate with data analysts and data scientists to understand data requirements and deliver clean, well‑documented data assets.
- Maintain and tune data storage solutions on AWS (e.g., S3, Redshift, RDS) for performance, cost efficiency, and security.
- Apply best practices for data quality, lineage, and governance, including automated testing and validation.
Requirements
- 3+ years of professional experience in data engineering, with a focus on PySpark and Python.
- Strong SQL skills and experience building data models in relational or columnar databases.
- Hands‑on experience with AWS services such as S3, Redshift, Glue, or EMR.
- Proficiency in orchestrating workflows using Apache Airflow or similar tools.
- Solid understanding of data engineering concepts, including ETL/ELT design, data partitioning, and performance optimization.