remote
Senior Platform Engineer - Machine Learning Platform - Roblox
ML Engineer
Lead the design, development, and scaling of Roblox's machine‑learning platform, building robust, cloud‑native services with Python, Java, Kubernetes and AWS to enable creators worldwide.
About the role
Key Responsibilities
- Architect, build, and operate a highly available, low‑latency ML platform that serves billions of daily users.
- Design and implement scalable microservices using Python and Java, containerized with Docker and orchestrated by Kubernetes.
- Integrate the platform with AWS services (EKS, S3, SageMaker) and ensure cost‑effective, secure cloud operations.
- Collaborate with data scientists and product teams to create end‑to‑end pipelines for model training, deployment, and monitoring.
- Establish CI/CD best practices, automated testing, and observability tooling to maintain platform reliability.
Requirements
- 5+ years of experience building large‑scale distributed systems, preferably in a cloud environment.
- Strong proficiency in Python and Java, with hands‑on experience in containerization (Docker) and orchestration (Kubernetes).
- Deep understanding of AWS services and infrastructure‑as‑code practices.
- Experience designing and operating machine‑learning pipelines or platforms.
- Proven ability to work cross‑functionally, mentor engineers, and drive technical excellence.
Skills
pythonjavakubernetesawsmachine learning