onsite
Software Engineer - Model Evaluation & Benchmarking
Software Engineer
Lead the design and implementation of automated evaluation pipelines for multimodal AI models, ensuring reliability and quality across image, video, and generative systems using Python and advanced benchmarking techniques.
About the role
Key Responsibilities
- Design and maintain end‑to‑end evaluation pipelines for multimodal generative and vision‑based models.
- Develop automated benchmarking suites that assess realism, consistency, and quality across image, video, and multimodal outputs.
- Collaborate with applied science, infrastructure, and product teams to define evaluation metrics and data requirements.
- Implement dataset‑driven testing frameworks and performance validation pipelines, integrating with CI/CD workflows.
- Analyze evaluation results, provide actionable insights, and drive continuous improvement of model quality.
Requirements
- Strong programming skills in Python with experience building scalable data pipelines.
- Hands‑on experience with machine learning model evaluation, benchmarking, and metrics definition.
- Familiarity with multimodal AI systems (image, video, text) and generative models.
- Proficiency in version control, containerization (Docker), and CI/CD practices.
- Excellent analytical and communication skills, able to translate technical findings into product‑ready insights.
Skills
pythonmachine learning