onsite
LLM Evaluation Product Operations Specialist
Research Engineer
Lead product operations for LLM evaluation, designing automated tests, collaborating across teams, and analyzing data to refine metrics and drive product excellence.
About the role
Key Responsibilities
- Design, implement, and maintain automated testing frameworks for large language model (LLM) evaluation pipelines.
- Collaborate with engineering, research, and product teams to define evaluation metrics and success criteria.
- Collect, clean, and analyze evaluation data to uncover insights and recommend improvements.
- Document test cases, results, and best practices to support continuous integration and deployment.
- Monitor and report on evaluation performance, ensuring alignment with business goals and compliance standards.
Requirements
- Proven experience in automated testing and test framework development.
- Strong analytical skills with proficiency in data analysis tools (Python, SQL).
- Excellent cross‑functional communication and stakeholder management.
- Familiarity with LLMs, evaluation methodologies, and metric design.
- Detail‑oriented mindset and ability to thrive in a fast‑paced environment.