We’re looking for an AI Engineer to help us design, evaluate, and scale our next generation of LLM-powered agents. This role is deeply technical, but also product and customer-oriented: you’ll build datasets, run evaluations, improve model performance, and ensure our agents deliver real value in real workflows.
As an AI Engineer, you’ll work across Engineering, Product, and Customer teams to continuously improve our agent quality, reliability, and speed. You won’t just build agents — you’ll build the systems, metrics, and feedback loops that make those agents better over time.
Responsibilities
- Build, curate, and maintain agentic workflows along with datasets for training and evaluation.
- Run systematic LLM evaluations, track regressions, and ensure models meet quality bars.
- Define and implement LLM performance metrics(e.g., correctness, latency, hallucination control, safety).
- Work closely with Product and Customer-facing teams, experiment with prompting techniques, fine-tuning datasets, retrieval strategies, and model configurations and ensure agent behaviour aligns with user expectations.
- Develop internal tooling to speed up evaluation, annotation, and iteration cycles.
- Build automated pipelines for regression checks and model monitoring.
- Create mechanisms that turn real user interactions into actionable model improvements.
- Ensure agents behave consistently across large-scale production scenarios.
- Debug complex system behaviours spanning prompts, tools, APIs, and model responses.
- To understand real-world use cases and failure modes. Translate customer insights into model needs, data requirements, and product improvements.
- Tune latency, turn-taking, and conversational naturalness for voice AI systems.
- Write clean, reliable code in Python or JS for model pipelines, tools, and integrations.
- Think in systems: from data ingestion to model outputs to user-facing behaviour.
Success Looks Like
- Strong evaluation coverage with clear, actionable metrics.
- Faster iteration cycles due to improved internal tools and workflows.
- Measurable improvement in agent accuracy, consistency, and safety.
- Reliable agent performance in production — fewer escalations, fewer regressions.
- Clear alignment between customer needs and agent capabilities.
- Smooth collaboration across engineering, product, and customer teams.
What We’re Looking For
- 2–5 years experience in software engineering, with strong fundamentals in building reliable, scalable systems. 1–2 years of experience working on AI-backed products(LLM engineering, evaluations, prompting, or agent development).
- Hands-on experience with LLMs, prompt engineering, evaluation frameworks, or dataset creation.
- Strong understanding of how LLMs work: prompting