remote

Engineering Manager, LLM Performance - NVIDIA

Engineering Manager

Lead a high‑impact team to accelerate inference for large language and vision models, driving performance improvements across TensorRT LLM, vLLM, and SGLang using deep technical expertise and hands‑on management.

About the role

Key Responsibilities

Define and execute the roadmap for next‑generation LLM/VLM/VLA inference software, ensuring scalability and low latency.
Lead, mentor, and grow a multidisciplinary engineering team of senior engineers and researchers.
Design and implement performance‑critical components in C++/CUDA, integrating with TensorRT, vLLM, and SGLang.
Collaborate with product, research, and hardware groups to align software optimizations with GPU architecture advances.
Establish best practices for profiling, benchmarking, and continuous performance improvement across distributed deployments.

Requirements

5+ years of software engineering experience, with at least 2 years in a technical leadership or engineering management role.
Deep expertise in C++, Python, and CUDA programming for high‑performance AI workloads.
Proven track record optimizing inference performance for large language or vision models on GPU platforms.
Strong understanding of distributed systems, profiling tools, and modern deep‑learning frameworks.
Excellent communication and collaboration skills, with the ability to influence cross‑functional teams.

Skills

pythonccuda

CompanyNVIDIA

DepartmentResearch

LocationSanta Clara, California, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary431,250

Posted June 24, 2026