onsite
Senior AI Engineer - LLM Evaluation
AI Engineer
Senior AI Engineer specializing in evaluating and benchmarking large language models, implementing AI observability, and ensuring governance compliance across AWS and Azure environments.
About the role
Key Responsibilities
- Design and execute comprehensive evaluation frameworks for large language models (LLMs) to assess performance, safety, and bias.
- Develop automated benchmarking pipelines using Python and cloud services (AWS, Azure) to generate reproducible metrics.
- Implement AI observability tools that monitor model behavior in production, detect drift, and trigger alerts.
- Collaborate with governance teams to embed compliance checks, data provenance, and audit trails into the model lifecycle.
- Provide technical guidance and mentorship to junior engineers on LLM evaluation best practices and cloud infrastructure.
Requirements
- 5+ years of experience in AI/ML engineering with a focus on large language models.
- Strong proficiency in Python and experience building scalable evaluation pipelines.
- Hands‑on expertise with AWS and Azure services for AI workloads.
- Demonstrated knowledge of AI governance, model risk management, and observability techniques.
- Excellent problem‑solving skills and ability to communicate complex technical concepts to cross‑functional teams.