remote
Senior Manager, Cloud Services Platform - NVIDIA
Software Engineer
Lead the NGC Cloud team to design, automate, and scale global cloud operations, driving reliability and efficiency across NVIDIA’s accelerated computing services.
About the role
Key Responsibilities
- Lead the design and implementation of cloud‑native automation frameworks to streamline operational workflows across the NGC Cloud platform.
- Drive reliability and scalability initiatives, ensuring high availability of services for global customers.
- Collaborate with engineering, security, and product teams to define best practices for cloud operations, monitoring, and incident response.
- Mentor and grow a high‑performing team of cloud engineers and SREs, fostering a culture of continuous improvement.
- Own the operational roadmap, prioritizing initiatives that deliver measurable business impact.
Requirements
- 10+ years of experience in cloud operations, DevOps, or SRE roles, with a proven track record of leading large, distributed teams.
- Deep expertise in Kubernetes, CI/CD pipelines, and cloud automation tools (e.g., Terraform, Ansible).
- Strong knowledge of AWS services and experience architecting large‑scale, multi‑region deployments.
- Excellent communication skills and the ability to influence cross‑functional stakeholders.
- Passion for innovation and a hands‑on approach to solving complex operational challenges.