onsite

Senior Site Reliability Engineer - AcquireX

Site Reliability Engineer

Senior Site Reliability Engineer leading end‑to‑end reliability, automation, and observability for mission‑critical production systems using Kubernetes, Prometheus, Grafana, CI/CD pipelines, AWS, and Terraform.

About the role

Key Responsibilities

Own end‑to‑end reliability of production services, ensuring high availability, scalability, and performance.
Design, implement, and maintain CI/CD pipelines, infrastructure as code, and automated deployment workflows.
Build and maintain observability stack (Prometheus, Grafana, Loki) for real‑time monitoring, alerting, and incident response.
Lead incident management, root‑cause analysis, and post‑mortem documentation to drive continuous improvement.
Collaborate with development teams to embed SRE best practices into code reviews, architecture decisions, and release processes.

Requirements

10+ years of experience in SRE/DevOps roles with proven track record in large‑scale distributed systems.
Deep expertise in Kubernetes, container orchestration, and cloud platforms (AWS preferred).
Hands‑on experience with Prometheus, Grafana, Loki, and other observability tools.
Strong scripting skills (Python, Bash) and proficiency with IaC tools such as Terraform or CloudFormation.
Excellent communication, problem‑solving, and on‑call readiness for 24/7 production support.

Skills

kubernetesprometheusgrafanacicdawsterraform

CompanyAcquireX

DepartmentEngineering

LocationMaharashtra, India

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 20, 2026