onsite

Principal Software Engineer, Performance Tooling - Microsoft

Software Engineer

Lead performance engineering for AI inference, optimizing large language model runtimes across diverse hardware using Python, C++, and CUDA to deliver world‑class speed and efficiency.

About the role

Key Responsibilities

Design, implement, and maintain high‑performance inference pipelines for large language models across supercomputers, servers, and edge devices.
Collaborate with hardware teams to profile, benchmark, and tune GPU/CPU kernels, leveraging CUDA and low‑level optimizations.
Drive end‑to‑end performance improvements, from algorithmic changes to system‑level resource scheduling.
Mentor and guide a small team of engineers, fostering a culture of continuous learning and technical excellence.
Integrate new AI frameworks and libraries, ensuring seamless deployment across multiple platforms.

Requirements

10+ years of software engineering experience with a focus on performance and scalability.
Deep expertise in C++ and CUDA, with strong knowledge of memory management and parallel programming.
Proven track record optimizing AI inference workloads, including large language models.
Experience with profiling tools (Nsight, VTune, perf) and performance debugging.
Excellent communication skills and ability to work cross‑functionally in a fast‑paced environment.

Skills

pythonccuda

CompanyMicrosoft

DepartmentEngineering

LocationRedmond, WA, United States

Experience7+ years

Tenurefull-time

LevelLead

Salary304,200

Posted June 20, 2026