remote

AI Quality Engineer - Momentive Software

Software Engineer

Design and maintain evaluation frameworks and automated test pipelines for LLM‑driven AI systems, measuring accuracy, safety, latency and hallucination rates while collaborating with engineering and product teams.

About the role

Key Responsibilities

Design and implement evaluation frameworks to measure LLM and agentic AI quality across accuracy, consistency, safety, and task completion.
Build and maintain automated test pipelines (unit, integration, end‑to‑end) for AI features and agentic workflows.
Develop tooling to detect regressions in model behavior, prompt outputs, and decision‑making across releases.
Define, track, and report AI quality metrics such as hallucination rate, tool‑use accuracy, latency, and failure‑recovery performance.
Partner with engineers, data scientists, and product stakeholders to surface findings and drive continuous improvement.

Requirements

Strong programming skills in Python and experience with testing frameworks (e.g., pytest, unittest).
Hands‑on experience evaluating and debugging Large Language Models or other generative AI systems.
Proficiency in building CI/CD pipelines and automated test infrastructure.
Solid understanding of machine‑learning concepts, prompt engineering, and model performance metrics.
Excellent analytical and communication skills to translate technical findings into actionable insights.

Skills

pythonmachine learningcicd

CompanyMomentive Software

DepartmentEngineering

LocationAtlanta, Georgia, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 26, 2026