remoteonsite

Python Pyspark AWS - CGI

Software Engineer

Lead data engineering projects using Python and PySpark on AWS, designing scalable pipelines, optimizing performance, and ensuring data quality for enterprise clients.

About the role

Key Responsibilities

Design, develop, and maintain large-scale data pipelines using Python and PySpark on AWS services such as EMR, S3, and Redshift.
Collaborate with data scientists and business stakeholders to translate analytical requirements into robust ETL solutions.
Optimize Spark jobs for performance and cost, implementing best practices for partitioning, caching, and resource allocation.
Implement data quality checks, monitoring, and alerting to ensure reliability and compliance with data governance standards.
Document architecture, code, and processes, and mentor junior engineers on Spark and AWS best practices.

Requirements

5+ years of experience in data engineering with strong proficiency in Python and PySpark.
Hands‑on experience deploying and managing Spark workloads on AWS (EMR, Glue, Lambda).
Solid understanding of SQL, relational and NoSQL databases, and data modeling.
Experience with CI/CD pipelines, version control (Git), and automated testing.
Excellent problem‑solving skills and ability to work in a fast‑paced, collaborative environment.

Skills

pythonawssql

CompanyCGI

DepartmentEngineering

LocationTelangana, India

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 20, 2026