remoteonsite

Site Reliability Engineer - 66degrees

Site Reliability Engineer

Senior Site Reliability Engineer responsible for designing, deploying, and maintaining highly available cloud-native infrastructure using Kubernetes, Docker, and AWS. Drives automation, observability, and incident response to ensure seamless application delivery.

About the role

Key Responsibilities

Design, implement, and manage scalable Kubernetes clusters on AWS, ensuring high availability and performance.
Automate infrastructure provisioning and configuration using Terraform and CI/CD pipelines.
Implement monitoring, alerting, and logging with Prometheus, Grafana, and ELK stack to maintain system health.
Lead incident response, root‑cause analysis, and post‑mortem documentation to improve reliability.
Collaborate with development teams to embed SRE best practices into application lifecycle.

Requirements

5+ years of experience in site reliability or DevOps roles.
Proficient with Kubernetes, Docker, and AWS services (EKS, EC2, S3, CloudWatch).
Strong scripting skills (Python, Bash) and experience with Terraform.
Hands‑on experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
Excellent problem‑solving skills and a proactive approach to automation and reliability.

Skills

kubernetesdockerawsterraformprometheusgrafanacicd

Company66degrees

DepartmentEngineering

LocationBengaluru, Maharashtra, India

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 21, 2026