onsite
Senior Software Engineer - AI Infrastructure
Senior Software Engineer - AI Infrastructure
As a Senior Software Engineer - AI Infrastructure at NVIDIA, you will design, develop, and maintain highly scalable and reliable distributed systems for AI/ML workloads. You will work with cutting-edge technologies like Kubernetes, Docker, and gRPC, using C++, Python, or Go to build a robust cloud-based AI platform.
About the role
About the Role
Join the NVIDIA GPU Cloud (NGC) team and make a difference in advancing AI/ML development. As a Senior Software Engineer, you'll be at the forefront of building scalable, reliable, and secure AI infrastructure. This is an opportunity to work on cutting-edge technologies and contribute to a platform that empowers researchers and developers worldwide.
What you'll be doing:
- Design, develop, and maintain highly scalable and reliable distributed systems for AI/ML workloads.
- Implement and optimize services using C++, Python, and Go.
- Work with container orchestration technologies like Kubernetes and Docker.
- Develop and integrate APIs using technologies like gRPC.
- Manage and optimize data storage solutions (SQL/NoSQL).
- Troubleshoot and debug complex issues across various layers of the infrastructure.
- Collaborate with cross-functional teams to define and implement new features.
- Contribute to the overall architecture and design of our cloud-based AI platform.
- Mentor junior engineers and promote best practices in software development.
What we need to see:
- BS or MS in Computer Science, Computer Engineering, or a related field.
- 5+ years of experience in software development, with a focus on building large-scale distributed systems.
- Proficiency in C++, Python, or Go.
- Strong understanding of Kubernetes and containerization technologies.
- Experience with cloud platforms and microservices architecture.
- Solid grasp of data structures, algorithms, and object-oriented design.
- Ability to work in a fast-paced, agile development environment.
- Excellent problem-solving and debugging skills.
Ways to stand out from the crowd:
- Experience with machine learning frameworks and AI development tools.
- Familiarity with various data storage solutions (e.g., object storage, distributed file systems).
- Contributions to open-source projects.
- Strong communication and interpersonal skills.