onsite

Senior Software Architect - Deep Learning and HPC Communications - NVIDIA

ML Engineer

Lead the design and evolution of GPU communication libraries for scaling deep learning and HPC workloads, driving architecture, performance, and integration across CUDA, NCCL, NVSHMEM, and UCX.

About the role

Key Responsibilities

Define and drive the architectural roadmap for GPU‑based communication libraries that enable large‑scale deep learning and HPC applications.
Collaborate with hardware, driver, and software teams to integrate and optimize NCCL, NVSHMEM, UCX, and related components.
Lead performance analysis, profiling, and tuning across multi‑GPU and multi‑node environments.
Mentor senior engineers, conduct design reviews, and establish best practices for scalable, low‑latency communication.
Contribute to open‑source and internal SDKs, ensuring robust APIs and documentation for developers.

Requirements

10+ years of software development experience with C++ and CUDA, focusing on high‑performance, GPU‑accelerated systems.
Deep expertise in deep learning frameworks and HPC communication patterns, including NCCL, NVSHMEM, and UCX.
Proven track record of architecting and optimizing large‑scale, low‑latency communication stacks.
Strong problem‑solving skills, ability to lead cross‑functional teams, and excellent written and verbal communication.
Experience with performance profiling tools and a solid understanding of parallel algorithms and memory hierarchies.

Skills

ccudapythondeep learning

CompanyNVIDIA

DepartmentResearch

LocationAustin, Texas, United States

Experience7+ years

Tenurefull-time

LevelLead

Salary431,250

Posted June 26, 2026