remote

Generative AI Data Scientist

Data Scientist

Data Scientist specializing in Generative AI, building and scaling data pipelines with Apache Spark on cloud platforms to clean, process, and analyze large datasets for AI model development.

About the role

Key Responsibilities

Design, develop, and maintain scalable data pipelines using Apache Spark on cloud infrastructure.
Perform data cleaning, transformation, and feature engineering to support generative AI model training.
Collaborate with AI researchers to integrate processed data into large‑scale generative models.
Implement data quality checks and monitoring to ensure reliability of production pipelines.
Optimize data workflows for cost‑effective cloud execution and rapid experimentation.

Requirements

Strong proficiency in Python and Spark (PySpark or Scala).
Hands‑on experience with cloud platforms (AWS, Azure, or GCP) for data engineering.
Solid background in data analysis, processing, and cleaning of large, heterogeneous datasets.
Familiarity with generative AI concepts and model development pipelines.
Ability to work cross‑functionally with data scientists, engineers, and product teams.

Skills

pythonapache sparkdata analysis

DepartmentResearch

LocationUnited States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 22, 2026