onsite
Data Engineer Python & PySpark - Tata Consultancy Services (TCS)
Data Engineer
Data Engineer skilled in Python and PySpark to design, build, and maintain ETL/ELT pipelines, ingest data from diverse sources, and implement data lake and warehouse solutions using Spark SQL and cloud storage.
About the role
Key Responsibilities
- Design, develop, and optimize data processing pipelines using Python, PySpark, and Spark SQL.
- Build and maintain robust ETL/ELT workflows for structured and unstructured data.
- Ingest data from databases, files, APIs, and cloud storage platforms such as AWS S3.
- Implement data lake and data warehouse architectures, ensuring data quality and governance.
- Collaborate with data analysts and engineers to translate business requirements into scalable data solutions.
Requirements
- Strong hands‑on experience with Python and PySpark for large‑scale data engineering.
- Proficiency in Spark SQL and related big‑data frameworks.
- Demonstrated ability to build end‑to‑end ETL/ELT pipelines and data ingestion processes.
- Experience working with both structured and unstructured data sources.
- Familiarity with data lake concepts and cloud storage services, particularly AWS S3.