remote

Deep Learning Software Engineer, TensorRT Performance - New College Grad 2026

NVIDIA is seeking a Deep Learning Software Engineer, TensorRT Performance to analyze and enhance the performance of its deep learning inference ecosystem. The role involves developing benchmarking methodologies, contributing to inference frameworks like TensorRT, and optimizing deep learning models across various NVIDIA accelerators to achieve gold standards in Generative AI performance.

About the role

About the Role

NVIDIA is looking for a Deep Learning Software Engineer, TensorRT Performance to join their rapidly growing research and development team for Deep Learning Inference. This role focuses on analyzing and improving the performance of NVIDIA’s inference ecosystem. Companies worldwide leverage NVIDIA GPUs for deep learning, driving breakthroughs in Generative AI, Recommenders, and Vision. The successful candidate will join a team dedicated to building software for performance optimization, deployment, and serving of DL inference solutions, specializing in GPU-accelerated deep learning inference software like TensorRT, DL benchmarking, and performant model deployment solutions.

You will collaborate with the deep learning community to integrate TensorRT into OSS frameworks like TensorRT-EdgeLLM and PyTorch. Key responsibilities include identifying performance opportunities, optimizing state-of-the-art models across NVIDIA accelerators (from datacenter GPUs to edge SoCs), and implementing graph compiler algorithms, frontend operators, and code generators within NVIDIA’s inference ecosystem. You will also work with various teams on workflow improvements, performance modeling, analysis, kernel development, and inference software development.

What you'll be doing:

Establish groundbreaking performance benchmarking methodologies and analysis workflows to identify performance issues and opportunities for NVIDIA’s inference ecosystem (e.g., TensorRT/TensorRT-EdgeLLM/Torch-TensorRT).
Contribute features and code to NVIDIA/OSS inference frameworks, including but not limited to TensorRT/TensorRT-EdgeLLM/Torch-TensorRT.
Develop new model pipelines for NVIDIA’s inference ecosystem with optimized performance, covering areas like quantization, scheduling, memory management, and distributed inference, to set the gold standard for Gen AI performance.
Work with cross-collaborative teams inside and outside of NVIDIA across generative AI, automotive, robotics, image understanding, and speech understanding to set directions and develop innovative inference solutions.
Scale performance of deep learning models across different architectures and types of NVIDIA accelerators.

What we need to see:

Bachelors, Masters, PhD, or equivalent experience in relevant fields (Computer Science, Computer Engineering, EECS, AI).
2+ years of relevant software development experience.
Strong C++ and Python programming and software engineering skills.
Experience with DL frameworks (e.g., PyTorch, JAX, TensorFlow, ONNX) and inference libraries (e.g., TensorRT, TensorRT-LLM, vLLM, SGLang, FlashInfer).
Experience with performance analysis and performance optimization.

Ways to stand out from the crowd:

Strong foundation and architectural knowledge of GPUs.
Deep understanding of modern deep learning models and workloads (e.g., Transformers, Recommenders, ASR, TTS, Visual Understanding).
Proficiency in one of the deep learning programming domain-specific languages (e.g., CUDA/TileIR/CuTeDSL/cutlass/Triton).
Prior contributions to major LLM inference frameworks (e.g., vLLM) or prior experience with graph compilers in deep learning inference (e.g., TorchDynamo/TorchInductor).
Prior experience optimizing performance for low-latency, resource-constrained systems or embedded AI pipelines (e.g., Jetson systems or other edge AI accelerators).

About the role

About the Role

What you'll be doing:

Establish groundbreaking performance benchmarking methodologies and analysis workflows to identify performance issues and opportunities for NVIDIA’s inference ecosystem (e.g., TensorRT/TensorRT-EdgeLLM/Torch-TensorRT).
Contribute features and code to NVIDIA/OSS inference frameworks, including but not limited to TensorRT/TensorRT-EdgeLLM/Torch-TensorRT.
Develop new model pipelines for NVIDIA’s inference ecosystem with optimized performance, covering areas like quantization, scheduling, memory management, and distributed inference, to set the gold standard for Gen AI performance.
Work with cross-collaborative teams inside and outside of NVIDIA across generative AI, automotive, robotics, image understanding, and speech understanding to set directions and develop innovative inference solutions.
Scale performance of deep learning models across different architectures and types of NVIDIA accelerators.

What we need to see:

Bachelors, Masters, PhD, or equivalent experience in relevant fields (Computer Science, Computer Engineering, EECS, AI).
2+ years of relevant software development experience.
Strong C++ and Python programming and software engineering skills.
Experience with DL frameworks (e.g., PyTorch, JAX, TensorFlow, ONNX) and inference libraries (e.g., TensorRT, TensorRT-LLM, vLLM, SGLang, FlashInfer).
Experience with performance analysis and performance optimization.

Ways to stand out from the crowd:

Strong foundation and architectural knowledge of GPUs.
Deep understanding of modern deep learning models and workloads (e.g., Transformers, Recommenders, ASR, TTS, Visual Understanding).
Proficiency in one of the deep learning programming domain-specific languages (e.g., CUDA/TileIR/CuTeDSL/cutlass/Triton).
Prior contributions to major LLM inference frameworks (e.g., vLLM) or prior experience with graph compilers in deep learning inference (e.g., TorchDynamo/TorchInductor).
Prior experience optimizing performance for low-latency, resource-constrained systems or embedded AI pipelines (e.g., Jetson systems or other edge AI accelerators).

Deep Learning Software Engineer, TensorRT Performance - New College Grad 2026

About the role

About the Role

What you'll be doing:

What we need to see:

Ways to stand out from the crowd:

Deep Learning Software Engineer, TensorRT Performance - New College Grad 2026

About the role

About the Role

What you'll be doing:

What we need to see:

Ways to stand out from the crowd:

Skills