onsite
Generative AI Data Engineer III
AI Engineer
Senior data engineer specializing in generative AI pipelines, building scalable data models, ETL workflows, and cloud infrastructure using Python, Spark, Airflow, and AWS services.
About the role
Key Responsibilities
- Design, develop, and maintain end‑to‑end data pipelines that feed generative AI models, ensuring high throughput and low latency.
- Implement robust data modeling and schema design to support large‑scale AI training and inference workloads.
- Build and orchestrate ETL/ELT workflows using Apache Airflow and Spark, integrating diverse data sources and formats.
- Deploy and manage cloud infrastructure on AWS (S3, Redshift, EMR, Lambda) to support scalable AI data processing.
- Collaborate with AI researchers and product teams to translate model requirements into reliable data solutions.
- Monitor pipeline performance, troubleshoot issues, and continuously optimize for cost and efficiency.
Requirements
- 5+ years of professional experience in data engineering, with a focus on AI/ML data pipelines.
- Strong proficiency in Python and SQL, and hands‑on experience with Apache Spark.
- Expertise in workflow orchestration tools such as Airflow or similar.
- Deep knowledge of AWS services for data storage, processing, and orchestration.
- Solid understanding of data modeling, schema design, and best practices for large‑scale data systems.
Skills
pythonsqlapache sparkairflowaws