As a Software Engineer, Infrastructure Reliability at openai, you will be responsible for designing and implementing scalable and reliable infrastructure for our cutting-edge AI systems. You will work closely with our engineering teams to ensure the stability and performance of our infrastructure, and contribute to the development of new features and technologies.
Key Responsibilities:
- Design and implement scalable and reliable infrastructure solutions using Python, Node.js, and AWS.
- Collaborate with cross-functional teams to identify and prioritize infrastructure needs.
- Develop and maintain monitoring and logging tools to ensure infrastructure stability and performance.
- Contribute to the development of new features and technologies that improve infrastructure reliability and scalability.
- Participate in on-call rotations to ensure 24/7 infrastructure support.
Requirements:
- 3+ years of experience in software engineering, with a focus on infrastructure reliability.
- Strong understanding of cloud computing platforms, including AWS.
- Proficiency in programming languages such as Python and Node.js.
- Experience with machine learning and data analytics a plus.
- Excellent communication and collaboration skills.