onsite
Data Engineer - Python / Generative AI
AI Engineer
Build and maintain scalable data pipelines and distributed platforms for training and serving generative AI models, leveraging Python, SQL, and modern data engineering practices.
About the role
Key Responsibilities
- Design, develop, and operate robust data pipelines that ingest, transform, and store large volumes of structured and unstructured data for LLM training.
- Implement and maintain distributed processing frameworks (e.g., Apache Spark, Flink) to support high‑throughput data workflows.
- Collaborate with AI researchers to provision data sets, feature stores, and model‑serving infrastructure for generative AI applications.
- Optimize storage and query performance using SQL, columnar stores, and cloud data services.
- Ensure data quality, lineage, and governance across the end‑to‑end pipeline.
Requirements
- 5+ years of experience in data engineering, with strong Python programming skills.
- Hands‑on experience building and scaling distributed data systems (e.g., Spark, Kafka, Flink).
- Familiarity with generative AI concepts, large language models, and the data requirements for model training and inference.
- Proficiency in SQL and modern data storage technologies (e.g., Snowflake, BigQuery, Redshift, Delta Lake).
- Experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Docker, Kubernetes) is a plus.
Skills
pythongenerative aisql