onsite

Senior Site Reliability Engineer - Happy Staffers

Site Reliability Engineer

Lead platform reliability for a cloud‑native stack, troubleshooting Kubernetes and AWS infra, driving incident response, and collaborating with engineering and product to maintain high availability and performance.

About the role

Key Responsibilities

Act as the primary technical point of contact for user‑reported platform issues, triaging and resolving within defined SLAs.
Investigate, debug, and remediate incidents across Kubernetes clusters, AWS services, and application components.
Collaborate with engineering, product, and customer‑facing teams to identify root causes and implement preventive measures.
Design and maintain monitoring, alerting, and logging solutions to ensure proactive detection of reliability problems.
Participate in on‑call rotations, post‑mortem analysis, and continuous improvement of SRE practices.

Requirements

5+ years of experience in Site Reliability Engineering or DevOps roles.
Deep expertise with Kubernetes, AWS, and cloud‑native application stacks.
Strong troubleshooting skills and familiarity with monitoring tools (Prometheus, Grafana, CloudWatch).
Experience with incident response, root‑cause analysis, and post‑mortem documentation.
Excellent communication skills and ability to work cross‑functionally.

Skills

kubernetesaws

CompanyHappy Staffers

DepartmentEngineering

LocationUttar Pradesh, India

Experience5+ years

Tenurefull-time

LevelSenior

Salary1,000,000

Posted June 23, 2026