onsite
Machine Learning Engineer - AI Model Benchmarking
ML Engineer
Lead independent AI benchmarking projects, designing human baseline studies for open‑ended ML tasks. Leverage Python and advanced data analysis to evaluate model performance, provide actionable insights, and drive research quality in a sandboxed environment.
About the role
Key Responsibilities
- Design and execute human baseline experiments for open‑ended AI research tasks.
- Develop and maintain reproducible Python pipelines for data collection, preprocessing, and analysis.
- Interpret model outputs, generate benchmark metrics, and produce clear, actionable reports.
- Collaborate with research teams to refine evaluation protocols and improve benchmark relevance.
- Document methodologies, maintain version control, and ensure reproducibility of all experiments.
Requirements
- Strong background in machine learning, with hands‑on experience in Python and data‑analysis libraries (NumPy, pandas, scikit‑learn).
- Proven ability to design and conduct rigorous benchmark studies in AI research.
- Excellent analytical skills, with a focus on statistical evaluation and metric development.
- Self‑motivated, able to work independently in a sandboxed environment.
- Effective communication skills for presenting findings to technical and non‑technical audiences.
Skills
machine learningpythondata analysis