onsite

Data Engineer - Python / Generative AI

AI Engineer

Build and maintain scalable data pipelines and distributed platforms for training and serving generative AI models, leveraging Python, SQL, and modern data engineering practices.

About the role

Key Responsibilities

Design, develop, and operate robust data pipelines that ingest, transform, and store large volumes of structured and unstructured data for LLM training.
Implement and maintain distributed processing frameworks (e.g., Apache Spark, Flink) to support high‑throughput data workflows.
Collaborate with AI researchers to provision data sets, feature stores, and model‑serving infrastructure for generative AI applications.
Optimize storage and query performance using SQL, columnar stores, and cloud data services.
Ensure data quality, lineage, and governance across the end‑to‑end pipeline.

Requirements

5+ years of experience in data engineering, with strong Python programming skills.
Hands‑on experience building and scaling distributed data systems (e.g., Spark, Kafka, Flink).
Familiarity with generative AI concepts, large language models, and the data requirements for model training and inference.
Proficiency in SQL and modern data storage technologies (e.g., Snowflake, BigQuery, Redshift, Delta Lake).
Experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Docker, Kubernetes) is a plus.

Skills

pythongenerative aisql

DepartmentEngineering

LocationIN-MH-Pune, Maharashtra, India

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 26, 2026