onsite
Principal Software Engineer, Performance Tooling
Software Engineer
Lead the design and implementation of high‑performance tooling for deep‑learning workloads, leveraging C++ and CUDA to optimize distributed systems across complex computer architectures.
About the role
Key Responsibilities
- Architect and develop scalable performance‑analysis tools for GPU‑accelerated deep‑learning pipelines.
- Collaborate with systems and hardware teams to identify bottlenecks in distributed DNN training and inference.
- Implement low‑level C++ and CUDA optimizations, ensuring minimal latency and maximal throughput.
- Design and maintain instrumentation frameworks that capture fine‑grained metrics across multi‑node clusters.
- Mentor junior engineers and drive best practices in performance engineering.
Requirements
- 10+ years of software engineering experience, with deep expertise in C++ and CUDA.
- Proven track record in distributed systems and high‑performance computing.
- Strong understanding of computer architecture, GPU internals, and deep‑learning frameworks.
- Excellent problem‑solving skills and ability to translate complex performance data into actionable insights.
- Effective communication skills and a collaborative mindset.