onsite
Data Engineer AWS & PySpark - Zensar Technologies
Data Engineer
Experienced Data Engineer skilled in building and optimizing PySpark applications on AWS services such as EMR, Glue, and S3, with strong Python programming, version control, and data‑warehousing expertise.
About the role
Key Responsibilities
- Design, develop, and maintain PySpark applications using Spark DataFrames to process large‑scale datasets.
- Optimize Spark jobs for performance and cost efficiency on Amazon EMR.
- Integrate data pipelines with AWS analytics services including Athena, Glue, and Lambda.
- Implement data storage and retrieval solutions using Amazon S3, EC2, and related services.
- Collaborate with cross‑functional teams, applying version control (Git) and best practices for CI/CD.
Requirements
- 5+ years of hands‑on experience with Python and PySpark.
- Proficiency in AWS ecosystem (EMR, Glue, Athena, Lambda, EC2, S3, SNS).
- Strong understanding of data‑warehousing concepts such as dimensions, facts, and schema design (star/snowflake).
- Experience with Git or similar version‑control systems.
- Ability to troubleshoot and tune large‑volume data processing jobs.