Platform engineer, MLOps
As a Platform Engineer, MLOps, you will deploy and manage infrastructure for AI/ML operations, collaborating with AI/ML engineers to build CI/CD pipelines for reproducible experiments. You will also set up monitoring, logging, and alerting systems, ensuring optimal availability of training environments across clusters using tools like Docker and Kubernetes. This role involves maintaining large Kubernetes clusters, optimizing system performance, and providing operational support for software solutions.
As a Platform engineer, MLOps, you will be critical to deploying and managing cutting-edge infrastructure crucial for AI/ML operations, and you will collaborate with AI/ML engineers and researchers to develop a robust CI/CD pipeline that supports safe and reproducible experiments. Your expertise will also extend to setting up and maintaining monitoring, logging, and alerting systems to oversee extensive training runs and client-facing APIs. You will ensure that training environments are optimally available and efficiently managed across multiple clusters, enhancing our containerization and orchestration systems with advanced tools like Docker and Kubernetes.
This role demands a proactive approach to maintaining large Kubernetes clusters, optimizing system performance, and providing operational support for our suite of software solutions. If you are driven by challenges and motivated by the continuous pursuit of innovation, this role offers the opportunity to make a significant impact in a dynamic, fast-paced environment.
Posted June 2, 2026