About the Role
Riot Games is building the next generation of its ML Platform to support AI and machine learning systems across game development, player experiences, and internal tools. As a Staff Machine Learning Engineer on the ML Platform team, you will help design and scale the infrastructure powering a wide range of ML workloads and evolving AI systems across Riot. The ML Platform team builds and operates the shared infrastructure behind Riot’s AI ecosystem, including model serving, orchestration, feature management, and deployment workflows. Your goal will be to help teams across Riot move ML systems from experimentation into reliable production services quickly and confidently. You will architect systems for model deployment, observability, and lifecycle management while helping shape the long-term direction of Riot’s ML platform. You will apply modern MLOps practices to improve reliability, scalability, and developer experience for teams building AI-powered products across Riot. You will report to the Engineering Manager.
Responsibilities
- Design and operate AI & ML inference infrastructure, including deployment pipelines and CPU/GPU-aware orchestration
- Develop CI/CD workflows that enable rapid iteration and safe promotion from development to production
- Optimize infrastructure supporting varied model architectures, from foundation models to gradient boosted trees, for high throughput, low latency, and high availability
- Establish and evolve ML deployment best practices, including multi-version models, blue/green rollouts, shadow deployments, and rollback strategies
- Improve developer experience by reducing operational complexity and simplifying platform onboarding
- Influence long-term platform architecture and help shape technical direction across Riot’s ML ecosystem
- Collaborate with researchers and game teams to understand product needs and build reusable platform capabilities
- Use modern AI-assisted development tools and workflows thoughtfully to accelerate iteration, while maintaining engineering quality and reliability
Required Qualifications
- 6+ years of experience in engineering, with time spent on ML/AI, platform or infrastructure teams
- Experience operating inference platforms such as KServe and production ML infrastructure including Feast, Milvus, or similar open-source systems
- Experience with one or more inference serving frameworks, including NVIDIA Triton/Dynamo, TorchServe, or similar systems
- Familiarity with GPU orchestration, performance tuning, and cost-aware scheduling
- Experience with CI/CD workflows, infrastructure-as-code (e.g., Terraform), and artifact management
- Experience building and operating services within distributed or service-oriented architectures
Desired Qualifications
- Experience building ML infrastructure within a real-time, or latency-sensitive environment
- Hands-on experience with optimizing ML & AI deployments (LLMs, diffusion models, etc.) for throughput, latency and reliability
- Familiarity with agentic workflows and orchestration frameworks for LLM-based systems
- Passion for player experience, game systems, or creative technology development