remote
Production Systems Engineer, Automation - Meta
Systems Engineer
Design and implement test automation and tooling to improve reliability, efficiency, and scalability of large‑scale data‑center hardware infrastructure, collaborating with operations, hardware, and platform teams.
About the role
Key Responsibilities
- Design, develop, and maintain automated test frameworks and tooling for compute, storage, networking, and custom silicon platforms.
- Integrate CI/CD pipelines to validate hardware changes and ensure rapid, reliable rollouts across global data centers.
- Collaborate with data‑center operations, hardware engineering, and vendor partners to identify systemic issues and implement scalable solutions.
- Monitor production fleet health, analyze failure patterns, and drive continuous improvement of reliability and performance metrics.
- Document tooling, test procedures, and best practices to enable knowledge sharing across engineering teams.
Requirements
- Strong programming experience in Python and C++ for building automation and low‑level hardware interfaces.
- Hands‑on experience with Linux systems, scripting, and performance debugging in large‑scale environments.
- Proven track record developing test automation frameworks and integrating them into CI/CD workflows.
- Understanding of data‑center hardware components (servers, storage, networking, ASICs) and their operational characteristics.
- Excellent problem‑solving skills and ability to work cross‑functionally with hardware, software, and operations teams.
Skills
pythonctest automationlinuxcicd