onsite

Infrastructure and MLOps Engineer - Graphcore

MLOps Engineer

Design, build, and operate scalable MLOps infrastructure for AI workloads, leveraging Kubernetes, Docker, Terraform, and AWS to enable rapid model deployment and continuous integration for high‑performance compute.

About the role

Key Responsibilities

Architect and implement end‑to‑end MLOps pipelines that support training, validation, and serving of large‑scale AI models.
Deploy, manage, and optimize containerized workloads on Kubernetes clusters across on‑premise and cloud environments.
Automate infrastructure provisioning and configuration using Terraform and related IaC tools.
Integrate CI/CD workflows for model code, data, and artifacts, ensuring reproducibility and rapid iteration.
Monitor system performance, reliability, and cost, implementing observability solutions for GPU‑intensive workloads.
Collaborate with hardware, software, and data science teams to align infrastructure with evolving AI compute requirements.

Requirements

Strong experience with Python scripting for automation and orchestration.
Deep knowledge of Kubernetes, Docker, and container orchestration at scale.
Proficiency in Terraform or similar infrastructure‑as‑code tools, and cloud platforms such as AWS.
Hands‑on experience building CI/CD pipelines for machine‑learning workflows.
Solid Linux systems administration skills and familiarity with GPU‑accelerated environments.

Skills

pythonkubernetesdockerterraformawscicdlinux

CompanyGraphcore

DepartmentEngineering

LocationCambridge, United Kingdom

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 27, 2026