remote
Generative AI Data Scientist
Data Scientist
Data Scientist specializing in Generative AI, building and scaling data pipelines with Apache Spark on cloud platforms to clean, process, and analyze large datasets for AI model development.
About the role
Key Responsibilities
- Design, develop, and maintain scalable data pipelines using Apache Spark on cloud infrastructure.
- Perform data cleaning, transformation, and feature engineering to support generative AI model training.
- Collaborate with AI researchers to integrate processed data into large‑scale generative models.
- Implement data quality checks and monitoring to ensure reliability of production pipelines.
- Optimize data workflows for cost‑effective cloud execution and rapid experimentation.
Requirements
- Strong proficiency in Python and Spark (PySpark or Scala).
- Hands‑on experience with cloud platforms (AWS, Azure, or GCP) for data engineering.
- Solid background in data analysis, processing, and cleaning of large, heterogeneous datasets.
- Familiarity with generative AI concepts and model development pipelines.
- Ability to work cross‑functionally with data scientists, engineers, and product teams.
Skills
pythonapache sparkdata analysis