remote
Principal Software Engineering Manager - Substrate Efficiency - Microsoft
Engineering Manager
Lead a high‑impact team that builds and operates the massive AI inference platform for Microsoft 365 Copilot, focusing on low‑latency LLM APIs, GPU scaling, and substrate efficiency across global datacenters.
About the role
Key Responsibilities
- Lead and mentor a multidisciplinary engineering group responsible for the core LLM API, routing services, and substrate efficiency of a world‑scale AI inference platform.
- Design and implement high‑performance, low‑latency inference pipelines that run on thousands of GPUs across multiple datacenters.
- Drive architectural decisions around distributed systems, resource scheduling, and GPU utilization to meet strict SLAs for Microsoft 365 Copilot.
- Collaborate with research, product, and operations teams to translate cutting‑edge AI research into production‑ready services.
- Establish best practices for code quality, testing, observability, and continuous delivery in a cloud‑native environment.
Requirements
- 10+ years of software engineering experience, with at least 5 years in a leadership role building large‑scale, performance‑critical systems.
- Deep expertise in systems programming languages such as C++ and Rust, and strong scripting skills in Python.
- Hands‑on experience with GPU programming, CUDA, and optimizing workloads for massive parallelism.
- Proven track record designing, deploying, and operating Kubernetes‑based services at scale in cloud environments.
- Solid understanding of distributed systems, networking, and machine‑learning inference pipelines, preferably with large language models.
Skills
pythoncrustkubernetesmachine learning