remote
Engineering Manager, LLM Performance - NVIDIA
Engineering Manager
Lead a high‑impact team to accelerate inference for large language and vision models, driving performance improvements across TensorRT LLM, vLLM, and SGLang using deep technical expertise and hands‑on management.
About the role
Key Responsibilities
- Define and execute the roadmap for next‑generation LLM/VLM/VLA inference software, ensuring scalability and low latency.
- Lead, mentor, and grow a multidisciplinary engineering team of senior engineers and researchers.
- Design and implement performance‑critical components in C++/CUDA, integrating with TensorRT, vLLM, and SGLang.
- Collaborate with product, research, and hardware groups to align software optimizations with GPU architecture advances.
- Establish best practices for profiling, benchmarking, and continuous performance improvement across distributed deployments.
Requirements
- 5+ years of software engineering experience, with at least 2 years in a technical leadership or engineering management role.
- Deep expertise in C++, Python, and CUDA programming for high‑performance AI workloads.
- Proven track record optimizing inference performance for large language or vision models on GPU platforms.
- Strong understanding of distributed systems, profiling tools, and modern deep‑learning frameworks.
- Excellent communication and collaboration skills, with the ability to influence cross‑functional teams.