remote
AI Quality Engineer - Momentive Software
Software Engineer
Design and maintain evaluation frameworks and automated test pipelines for LLM‑driven AI systems, measuring accuracy, safety, latency and hallucination rates while collaborating with engineering and product teams.
About the role
Key Responsibilities
- Design and implement evaluation frameworks to measure LLM and agentic AI quality across accuracy, consistency, safety, and task completion.
- Build and maintain automated test pipelines (unit, integration, end‑to‑end) for AI features and agentic workflows.
- Develop tooling to detect regressions in model behavior, prompt outputs, and decision‑making across releases.
- Define, track, and report AI quality metrics such as hallucination rate, tool‑use accuracy, latency, and failure‑recovery performance.
- Partner with engineers, data scientists, and product stakeholders to surface findings and drive continuous improvement.
Requirements
- Strong programming skills in Python and experience with testing frameworks (e.g., pytest, unittest).
- Hands‑on experience evaluating and debugging Large Language Models or other generative AI systems.
- Proficiency in building CI/CD pipelines and automated test infrastructure.
- Solid understanding of machine‑learning concepts, prompt engineering, and model performance metrics.
- Excellent analytical and communication skills to translate technical findings into actionable insights.
Skills
pythonmachine learningcicd