remote
Site Reliability Engineer II - Indeed
Site Reliability Engineer
Senior Site Reliability Engineer focused on designing, building, and maintaining highly available, scalable infrastructure on AWS using Kubernetes, Terraform, Go, and Python, while driving reliability best practices and automation across the platform.
About the role
Key Responsibilities
- Collaborate with product teams to design, code, test, and deploy reliable services on AWS using Kubernetes and Terraform.
- Implement and maintain CI/CD pipelines, ensuring rapid, safe releases with automated testing and blue‑green deployments.
- Develop and maintain observability solutions (metrics, logs, traces) to detect, diagnose, and remediate incidents in real time.
- Automate operational tasks with Go and Python scripts, reducing manual toil and improving system resilience.
- Participate in on‑call rotations, perform root‑cause analysis, and drive post‑mortem improvements.
Requirements
- 5+ years of experience in site reliability or DevOps roles, with a strong focus on cloud infrastructure.
- Proficiency with AWS services (EC2, RDS, S3, EKS) and infrastructure-as-code tools like Terraform.
- Hands‑on experience with Kubernetes cluster management, Helm, and container orchestration.
- Strong programming skills in Go or Python for automation and tooling.
- Deep understanding of monitoring, alerting, and incident response best practices.
Skills
kubernetesawsterraformgopythoncicd