remoteonsite
Python PySpark and AWS Developer - CGI
Software Engineer
Develop and maintain data pipelines using Python, PySpark, and AWS services, ensuring high‑performance data processing and reliable ETL workflows for enterprise analytics initiatives.
About the role
Key Responsibilities
- Design, develop, and optimize scalable data pipelines using Python and PySpark on AWS.
- Implement ETL processes to ingest, transform, and load data from diverse sources into data lakes or warehouses.
- Leverage AWS services such as S3, Glue, Lambda, and Redshift to build robust, cost‑effective solutions.
- Collaborate with data architects and analysts to define data models, schemas, and quality standards.
- Monitor pipeline performance, troubleshoot issues, and apply best practices for security and compliance.
Requirements
- Strong proficiency in Python programming and PySpark for large‑scale data processing.
- Hands‑on experience with core AWS services (S3, Glue, Lambda, Redshift, IAM).
- Solid understanding of SQL and relational database concepts.
- Experience building and maintaining ETL workflows in a cloud environment.
- Ability to work independently and within cross‑functional teams, delivering high‑quality code on schedule.