onsite

Machine Learning Engineer - AI Model Benchmarking

ML Engineer

Lead independent AI benchmarking projects, designing human baseline studies for open‑ended ML tasks. Leverage Python and advanced data analysis to evaluate model performance, provide actionable insights, and drive research quality in a sandboxed environment.

About the role

Key Responsibilities

Design and execute human baseline experiments for open‑ended AI research tasks.
Develop and maintain reproducible Python pipelines for data collection, preprocessing, and analysis.
Interpret model outputs, generate benchmark metrics, and produce clear, actionable reports.
Collaborate with research teams to refine evaluation protocols and improve benchmark relevance.
Document methodologies, maintain version control, and ensure reproducibility of all experiments.

Requirements

Strong background in machine learning, with hands‑on experience in Python and data‑analysis libraries (NumPy, pandas, scikit‑learn).
Proven ability to design and conduct rigorous benchmark studies in AI research.
Excellent analytical skills, with a focus on statistical evaluation and metric development.
Self‑motivated, able to work independently in a sandboxed environment.
Effective communication skills for presenting findings to technical and non‑technical audiences.

Skills

machine learningpythondata analysis

DepartmentResearch

LocationUnited States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 23, 2026