onsite

Research Scientist, Agentic Data & Benchmarking - Institute of Foundation Models

Research Engineer

Lead research on agentic data pipelines and benchmarking for foundation models, designing scalable data collection, evaluation frameworks, and analysis tools using Python, PyTorch, and distributed computing.

About the role

Key Responsibilities

Design and implement data collection and curation pipelines for large‑scale foundation model training.
Develop rigorous benchmarking suites to evaluate model capabilities, safety, and alignment across diverse tasks.
Conduct statistical analysis and interpret results to guide model improvements and research directions.
Collaborate with researchers, data scientists, and engineers to integrate benchmarking feedback into model development cycles.
Publish findings in top conferences and contribute open‑source tools for the broader AI community.

Requirements

Ph.D. or equivalent experience in Machine Learning, Computer Science, Statistics, or a related field.
Strong programming skills in Python with hands‑on experience in PyTorch or TensorFlow.
Proven expertise in designing and executing large‑scale data benchmarks and statistical evaluation methods.
Experience with distributed computing frameworks (e.g., Ray, Spark) and handling petabyte‑scale datasets.
Track record of publishing research in top AI venues and contributing to open‑source projects.

Skills

pythonpytorchtensorflow

CompanyInstitute of Foundation Models

DepartmentResearch

LocationSunnyvale, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 24, 2026