onsite
Site Reliability Engineer III - Domino's
Site Reliability Engineer
Senior Site Reliability Engineer driving reliability, scalability, and automation for a high‑traffic digital platform using Kubernetes, Docker, AWS, Terraform, and CI/CD pipelines.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure for a global e‑commerce platform using Kubernetes and Docker.
- Automate deployment pipelines with CI/CD tools, Terraform, and GitOps practices to ensure rapid, reliable releases.
- Monitor system health, troubleshoot incidents, and conduct post‑mortem analyses to improve reliability and performance.
- Collaborate with development, security, and product teams to embed SRE principles into the software development lifecycle.
- Implement observability solutions (metrics, logs, traces) and define SLIs/SLOs to drive continuous improvement.
Requirements
- 5+ years of experience in site reliability or DevOps roles, with a strong focus on cloud-native technologies.
- Proficiency with AWS services (EC2, EKS, RDS, CloudWatch) and infrastructure-as-code tools like Terraform.
- Hands‑on experience with Kubernetes, Docker, and CI/CD pipelines (GitHub Actions, Jenkins, ArgoCD).
- Strong scripting skills in Python or Bash for automation and tooling.
- Excellent problem‑solving, communication, and collaboration skills in a fast‑paced environment.
Skills
kubernetesdockerawsterraformcicdpython