remoteonsite

SENIOR SITE RELIABILITY ENGINEER - Svitla Systems

Site Reliability Engineer

Senior Site Reliability Engineer responsible for automating incident routing, managing uptime metrics, and ensuring 24/7 reliability of a large online marketplace using Kubernetes, Prometheus, Grafana, and AWS.

About the role

Key Responsibilities

Design and implement automated incident routing to ensure rapid response across multiple teams.
Own and improve key reliability metrics such as MTTD, MTTR, and uptime for a 24/7 marketplace.
Develop and maintain observability stack using Prometheus, Grafana, and custom dashboards.
Collaborate with development and operations to embed reliability best practices into CI/CD pipelines.
Lead post‑mortem analysis and drive continuous improvement initiatives.

Requirements

5+ years of experience in site reliability or DevOps roles.
Proficiency with Kubernetes, container orchestration, and cloud platforms (AWS preferred).
Strong scripting skills (Python, Bash) and experience with monitoring/alerting tools.
Excellent incident management and communication skills.
Experience with CI/CD tooling and automated deployment pipelines.

Skills

kubernetesprometheusgrafanaaws

CompanySvitla Systems

DepartmentEngineering

LocationUttar Pradesh, India

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 19, 2026