remote
Global Production Systems Engineer - Meta
Systems Engineer
Lead the design, deployment, and optimization of large‑scale data center infrastructure, driving automation, capacity planning, and operational excellence across Meta’s global data centers.
About the role
Key Responsibilities
- Architect and implement scalable production systems for thousands of servers, ensuring high availability and performance.
- Develop and maintain automation scripts (Python, Bash) to streamline deployment, configuration, and monitoring workflows.
- Collaborate with cross‑functional teams to prioritize workstreams based on operational impact and evolving infrastructure needs.
- Analyze capacity and performance metrics to forecast growth, optimize resource utilization, and reduce operational costs.
- Lead incident response and root‑cause analysis, driving continuous improvement in reliability and resilience.
Requirements
- 5+ years of experience in production systems engineering within large data center environments.
- Proficiency in scripting (Python, Bash) and infrastructure automation tools.
- Strong understanding of monitoring, alerting, and capacity planning practices.
- Excellent problem‑solving skills and ability to work in a fast‑paced, collaborative setting.
- Experience with cloud platforms (AWS, Azure, or GCP) is a plus.