remote
Data Center Production Operations Engineer - Meta
Systems Engineer
Operate and maintain large‑scale data center infrastructure, handling server hardware lifecycle, rack deployments, incident response, and capacity planning while collaborating with hardware, network, and facilities teams.
About the role
Key Responsibilities
- Execute hands‑on installation, configuration, and de‑commissioning of server hardware across high‑density racks.
- Monitor and troubleshoot production systems, responding to incidents to restore service within defined SLAs.
- Coordinate capacity planning and readiness activities to support growth and scaling of data center resources.
- Collaborate with hardware engineering, network operations, and facilities teams to ensure infrastructure meets availability and performance targets.
- Develop and maintain automation scripts (Python, Bash) for routine tasks, deployment workflows, and health checks.
Requirements
- Strong experience with Linux operating systems and command‑line tools.
- Proficiency in scripting languages such as Python or Bash for automation and troubleshooting.
- Hands‑on knowledge of server hardware lifecycle processes, rack‑level deployments, and data center operations.
- Demonstrated ability to diagnose and resolve complex incidents in a fast‑paced environment.
- Experience with capacity planning, performance monitoring, and working cross‑functionally with engineering and facilities teams.