onsite
Senior Staff Engineer, Platform Infrastructure
Senior Staff Engineer, Platform Infrastructure
NVIDIA is seeking a Senior Staff Platform Infrastructure Engineer to join the Enterprise Software Infrastructure Platform team. In this role, you will lead the design, development, and maintenance of core platform infrastructure components and services, driving best practices for scalability, reliability, and security. You will also collaborate with cross-functional teams, mentor junior engineers, and provide operational support for critical systems.
About the role
About the Role
Join the NVIDIA Enterprise Software Infrastructure Platform team as a Senior Staff Platform Infrastructure Engineer. We are building a best-in-class platform to enable accelerated computing for a variety of Enterprise applications. Our team is rapidly growing and passionate about building the infrastructure that powers NVIDIA's Enterprise AI software offerings. As a Senior Staff Engineer, you will be a key contributor to our core platform and infrastructure, designing and implementing features that empower our engineering teams to build and deliver cutting-edge software products.
What you'll be doing:
- Lead the design, development, and maintenance of core platform infrastructure components and services.
- Drive the adoption of best practices for scalability, reliability, security, and performance across the platform.
- Collaborate with cross-functional teams to understand their infrastructure needs and provide innovative solutions.
- Mentor junior engineers and contribute to a culture of technical excellence and continuous improvement.
- Participate in on-call rotations and provide operational support for critical infrastructure systems.
What we need to see:
- BS, MS, or PhD in Computer Science or a related technical field, or equivalent experience.
- 8+ years of experience in software development with a focus on platform infrastructure, distributed systems, or site reliability engineering.
- Strong proficiency in one or more programming languages such as C++, Java, Python, Go, or Rust.
- Expertise with cloud infrastructure technologies (e.g., Kubernetes, Docker, public cloud platforms).
- Experience with CI/CD pipelines and tools (e.g., Git, Jenkins, TeamCity).
- Deep understanding of microservices architecture, distributed systems, and SRE principles.
- Excellent problem-solving skills, with a track record of tackling complex technical challenges.
- Strong communication and collaboration skills, with the ability to influence technical direction.
Ways to stand out from the crowd:
- Experience building and operating large-scale, highly available distributed systems.
- Contributions to open-source projects related to platform infrastructure.
- Familiarity with observability tools and practices (monitoring, logging, alerting).
- Experience working in an Agile/Scrum development environment.