onsite
Principal Software Engineer, Performance Tooling - Microsoft
Software Engineer
Lead performance engineering for AI inference, optimizing large language model runtimes across diverse hardware using Python, C++, and CUDA to deliver world‑class speed and efficiency.
About the role
Key Responsibilities
- Design, implement, and maintain high‑performance inference pipelines for large language models across supercomputers, servers, and edge devices.
- Collaborate with hardware teams to profile, benchmark, and tune GPU/CPU kernels, leveraging CUDA and low‑level optimizations.
- Drive end‑to‑end performance improvements, from algorithmic changes to system‑level resource scheduling.
- Mentor and guide a small team of engineers, fostering a culture of continuous learning and technical excellence.
- Integrate new AI frameworks and libraries, ensuring seamless deployment across multiple platforms.
Requirements
- 10+ years of software engineering experience with a focus on performance and scalability.
- Deep expertise in C++ and CUDA, with strong knowledge of memory management and parallel programming.
- Proven track record optimizing AI inference workloads, including large language models.
- Experience with profiling tools (Nsight, VTune, perf) and performance debugging.
- Excellent communication skills and ability to work cross‑functionally in a fast‑paced environment.