onsite
Lead Systems Software Test Engineer - CSP Engagements - NVIDIA
QA Engineer
Lead the validation of NVIDIA's datacenter ML stack for cloud service providers, designing test strategies, automating full‑stack tests, and collaborating with hardware and software teams to ensure high‑performance training and inference platforms.
About the role
Key Responsibilities
- Define and own test strategies and validation plans for CSP integrations of NVIDIA datacenter products.
- Develop and maintain automated test frameworks and CI/CD pipelines for full‑stack (cluster‑to‑rack) verification.
- Execute performance, stability, and scalability tests on ML workloads, analyzing results to drive hardware‑software optimizations.
- Collaborate with hardware, firmware, and software engineering teams to troubleshoot issues and validate fixes.
- Provide technical guidance and mentorship to junior test engineers and act as the primary point of contact for CSP customers.
Requirements
- 5+ years of systems software testing experience, preferably with GPU‑accelerated or ML workloads.
- Strong programming skills in Python and C++ on Linux environments.
- Hands‑on experience with CUDA, driver stacks, and performance profiling tools.
- Proficiency in building and maintaining test automation frameworks and CI/CD systems.
- Excellent problem‑solving and communication skills, with a track record of customer‑facing technical support.
Skills
pythonclinuxcudatest automationcicd