Technical Program Manager - Cluster Orchestration & Applied Training
Staff Technical Program Manager role at CoreWeave, focusing on Cluster Orchestration and Applied Training, utilizing skills in Python, AWS, and Machine Learning to accelerate AI breakthroughs and innovation.
CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com .
What You’ll Do:
CoreWeave is seeking a Staff Technical Program Manager to lead complex, cross-functional programs across Cluster Orchestration and Applied Training within our AI/ML Platform Services organization.
Cluster Orchestration is the platform layer that makes sure large AI workloads are scheduled, launched, and managed reliably across CoreWeave’s clusters. Applied Training is the layer on top of that infrastructure that helps researchers and customers use it for pre-training, fine-tuning, reinforcement learning, evaluations, and sandboxed experimentation.
About the role:
In this role, you will partner with engineering, product, infrastructure, and research-adjacent teams to improve both how workloads run on the cluster and how users interact with the training platform built on top of it. That includes driving programs across orchestration systems such as Slurm-on-Kubernetes (SUNK), Kueue, and workflow integrations, while also helping scale the environments, tooling, and operational mechanisms that make training and evaluation workflows easier to use.
This is a highly cross-functional role for a TPM who combines strong technical depth, excellent execution instincts, and the ability to bring structure and clarity to fast-moving infrastructure and AI platform initiatives.
Responsibilities:
Posted June 6, 2026