onsite
Engineer, Senior - Qualcomm
Software Engineer
Senior AI/ML Stability Test Engineer responsible for designing and executing long‑duration stability, stress, and fault‑injection tests on large‑scale AI inference platforms, ensuring robustness and high availability of model‑serving systems.
About the role
Key Responsibilities
- Design and implement long‑duration stability (MTBF/soak) tests for AI inference platforms.
- Develop and run stress, fuzzing, and fault‑injection frameworks to evaluate system resilience.
- Analyze test results, identify failure modes, and collaborate with engineering teams to drive performance improvements.
- Automate test execution pipelines using Python and CI/CD tools.
- Document test strategies, metrics, and recommendations for reliability enhancements.
Requirements
- 5+ years of experience in software testing, with a focus on AI/ML or high‑performance systems.
- Proficiency in Python and test automation frameworks.
- Strong understanding of AI inference pipelines, model serving, and performance tuning.
- Experience with stress testing, fault injection, and reliability engineering.
- Excellent analytical, communication, and problem‑solving skills.
Skills
pythonmachine learningtest automation