onsite

LLM Ops Engineer

Research Engineer

Lead the operational excellence of large language model deployments, ensuring scalable, reliable, and monitored services on AWS with A/B testing, autoscaling, and robust alerting.

About the role

Key Responsibilities

Design, implement, and maintain scalable LLM deployment pipelines on AWS, leveraging autoscaling and load balancing.
Configure and manage A/B testing frameworks to evaluate model variants and performance metrics.
Set up comprehensive alerting and monitoring solutions to detect anomalies and ensure high availability.
Collaborate with data scientists and ML engineers to integrate new models into production workflows.
Automate deployment processes using CI/CD tools, ensuring rapid and reliable releases.

Requirements

Proven experience with AWS services (ECS/EKS, Lambda, CloudWatch, Auto Scaling).
Strong background in DevOps practices, CI/CD pipelines, and infrastructure as code.
Hands‑on knowledge of A/B testing methodologies and performance monitoring.
Excellent scripting skills (Python, Bash) and familiarity with containerization.
Ability to troubleshoot complex production issues and optimize system performance.

Skills

awscicd

DepartmentResearch

LocationMelbourne, Australia

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 22, 2026