onsite
Senior Software Architect - Deep Learning and HPC Communications - NVIDIA
ML Engineer
Lead the design and evolution of GPU communication libraries for scaling deep learning and HPC workloads, driving architecture, performance, and integration across CUDA, NCCL, NVSHMEM, and UCX.
About the role
Key Responsibilities
- Define and drive the architectural roadmap for GPU‑based communication libraries that enable large‑scale deep learning and HPC applications.
- Collaborate with hardware, driver, and software teams to integrate and optimize NCCL, NVSHMEM, UCX, and related components.
- Lead performance analysis, profiling, and tuning across multi‑GPU and multi‑node environments.
- Mentor senior engineers, conduct design reviews, and establish best practices for scalable, low‑latency communication.
- Contribute to open‑source and internal SDKs, ensuring robust APIs and documentation for developers.
Requirements
- 10+ years of software development experience with C++ and CUDA, focusing on high‑performance, GPU‑accelerated systems.
- Deep expertise in deep learning frameworks and HPC communication patterns, including NCCL, NVSHMEM, and UCX.
- Proven track record of architecting and optimizing large‑scale, low‑latency communication stacks.
- Strong problem‑solving skills, ability to lead cross‑functional teams, and excellent written and verbal communication.
- Experience with performance profiling tools and a solid understanding of parallel algorithms and memory hierarchies.
Skills
ccudapythondeep learning