remote

Site Reliability Engineer - S. A. Solution

Site Reliability Engineer

Senior Site Reliability Engineer responsible for troubleshooting, incident response, and performance optimization across Kubernetes and AWS environments, ensuring high platform reliability and user satisfaction.

About the role

Key Responsibilities

Act as the primary technical point of contact for user‑reported platform issues, triaging and resolving incidents within defined SLAs.
Investigate, debug, and remediate problems across Kubernetes clusters, AWS infrastructure, and application services.
Collaborate with engineering, product, and customer‑facing teams to root‑cause failures and implement preventive measures.
Design and maintain monitoring, alerting, and logging solutions to detect and mitigate reliability risks.
Participate in on‑call rotations, post‑mortem analysis, and continuous improvement initiatives.

Requirements

Proven experience as an SRE or DevOps engineer in a cloud‑native environment.
Strong knowledge of Kubernetes, AWS services (EC2, EKS, RDS, CloudWatch), and container orchestration.
Hands‑on expertise with monitoring tools (Prometheus, Grafana, Datadog) and incident management.
Excellent troubleshooting, communication, and collaboration skills.
Experience with CI/CD pipelines, scripting (Python, Bash), and infrastructure as code (Terraform, CloudFormation) is a plus.

Skills

kubernetesaws

CompanyS. A. Solution

DepartmentEngineering

LocationUttar Pradesh, India

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary1,000,000

Posted June 23, 2026