onsite
Systems Engineer - SRE Enablement - AutoZone
Site Reliability Engineer
Lead SRE Enablement across hybrid GCP and on‑prem environments, establishing reliability standards, building shared tooling, and coaching teams to embed operational excellence.
About the role
Key Responsibilities
- Define and enforce SRE best practices, reliability standards, and incident response procedures across the organization.
- Design, develop, and maintain shared automation and monitoring tools that span Google Cloud Platform and on‑prem infrastructure.
- Collaborate with application, infrastructure, and architecture teams to integrate SRE principles into new and existing services.
- Provide hands‑on guidance and mentorship to development teams on reliability, observability, and capacity planning.
- Drive continuous improvement initiatives, including post‑mortem analysis, blameless culture, and reliability metrics.
Requirements
- 5+ years of experience in SRE, DevOps, or systems engineering roles.
- Deep expertise with Google Cloud Platform services (Compute Engine, Kubernetes Engine, Cloud Monitoring, Cloud Logging).
- Strong scripting/automation skills (Python, Bash, Terraform, or similar).
- Proven track record of building and scaling monitoring, alerting, and incident response tooling.
- Excellent communication skills and a collaborative mindset.
Skills
pythongojavagcpkubernetesterraformansible