onsite
Research Scientist, Agentic Data & Benchmarking - Institute of Foundation Models
Research Engineer
Lead research on agentic data pipelines and benchmarking for foundation models, designing scalable data collection, evaluation frameworks, and analysis tools using Python, PyTorch, and distributed computing.
About the role
Key Responsibilities
- Design and implement data collection and curation pipelines for large‑scale foundation model training.
- Develop rigorous benchmarking suites to evaluate model capabilities, safety, and alignment across diverse tasks.
- Conduct statistical analysis and interpret results to guide model improvements and research directions.
- Collaborate with researchers, data scientists, and engineers to integrate benchmarking feedback into model development cycles.
- Publish findings in top conferences and contribute open‑source tools for the broader AI community.
Requirements
- Ph.D. or equivalent experience in Machine Learning, Computer Science, Statistics, or a related field.
- Strong programming skills in Python with hands‑on experience in PyTorch or TensorFlow.
- Proven expertise in designing and executing large‑scale data benchmarks and statistical evaluation methods.
- Experience with distributed computing frameworks (e.g., Ray, Spark) and handling petabyte‑scale datasets.
- Track record of publishing research in top AI venues and contributing to open‑source projects.
Skills
pythonpytorchtensorflow