onsite
Senior Infrastructure Software Engineer, Deep Learning Libraries - NVIDIA
ML Engineer
Senior engineer designing scalable infrastructure for NVIDIA's deep learning libraries (cuDNN, TensorRT, CUDA kernels), building modular build and test pipelines across autonomous vehicle and data‑center platforms.
About the role
Key Responsibilities
- Design and implement modular infrastructure that automates build, test, and deployment workflows for deep learning libraries such as cuDNN, TensorRT, and CUDA kernel collections.
- Develop and maintain CI/CD pipelines on Linux platforms, ensuring reproducible builds across heterogeneous hardware ranging from Drive AGX to DGX servers.
- Collaborate with library developers, performance engineers, and hardware teams to integrate new features, optimize build times, and improve test coverage.
- Create and evolve tooling for source control, dependency management, and artifact publishing, supporting both on‑premise and cloud‑based development environments.
- Diagnose and resolve infrastructure bottlenecks, scaling challenges, and reliability issues in large‑scale continuous integration systems.
Requirements
- 5+ years of software engineering experience with C++ and Python in high‑performance or systems programming.
- Deep knowledge of CUDA development, Linux kernel build processes, and GPU‑accelerated workloads.
- Hands‑on experience building and maintaining CI/CD pipelines (Jenkins, GitLab CI, or similar) and automated testing frameworks.
- Proficiency with build systems (CMake, Make, Bazel) and version control (Git) in large, multi‑repo codebases.
- Strong problem‑solving skills, ability to work cross‑functionally, and a track record of delivering robust infrastructure for complex software stacks.
Skills
cpythoncudalinuxcicd