We are seeking a highly skilled Reliability/DFX Engineer to join our team at openai. As a key member of our engineering organization, you will be responsible for designing and implementing reliability and defect-free (DFX) engineering solutions to ensure the stability and quality of our AI systems.
Key Responsibilities:
- Develop and maintain reliability and DFX engineering processes and tools to ensure the quality and stability of our AI systems.
- Collaborate with cross-functional teams, including software engineering, data science, and product management, to identify and prioritize reliability and DFX engineering initiatives.
- Design and implement experiments to measure and improve the reliability and quality of our AI systems.
- Develop and maintain metrics and dashboards to track reliability and quality performance.
- Identify and prioritize areas for improvement and develop plans to address them.
Requirements:
- 5+ years of experience in software engineering, reliability engineering, or a related field.
- Strong understanding of software development principles, including testing, debugging, and deployment.
- Experience with machine learning and AI systems, including data preprocessing, model training, and deployment.
- Strong programming skills in languages such as Python.
- Experience with cloud-based infrastructure, including AWS.