onsite

Senior Site Reliability Engineer SRE - accelbyte

Site Reliability Engineer

Senior SRE responsible for designing, automating, and operating highly available cloud infrastructure, leveraging Kubernetes, Docker, Terraform, and monitoring tools to ensure reliability and performance of critical services.

About the role

Key Responsibilities

Design, implement, and maintain scalable, fault‑tolerant infrastructure on AWS using IaC tools such as Terraform.
Develop and manage container orchestration platforms (Kubernetes, Docker) to support micro‑service deployments.
Build and maintain CI/CD pipelines, automating build, test, and release processes.
Implement comprehensive monitoring, alerting, and observability solutions with Prometheus, Grafana, and log aggregation tools.
Collaborate with development teams to improve application reliability, performance, and incident response.
Participate in on‑call rotation, conduct root‑cause analysis, and drive post‑mortem improvements.

Requirements

5+ years of experience in site reliability or DevOps engineering, with a strong focus on cloud platforms (AWS).
Proficiency in container technologies (Kubernetes, Docker) and infrastructure as code (Terraform, CloudFormation).
Solid scripting/programming skills in Python or similar languages.
Hands‑on experience with monitoring and observability stacks (Prometheus, Grafana, ELK/EFK).
Deep understanding of Linux systems, networking, and CI/CD concepts.

Skills

kubernetesdockerterraformprometheusgrafanapythonawscicd

Companyaccelbyte

DepartmentEngineering

LocationSleman, India

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 25, 2026