remoteonsite

Head of Site Reliability Engineering

Software Engineer

Lead the Site Reliability Engineering team to design, build, and operate highly available, scalable infrastructure for a global mobile eSports platform, driving automation, reliability, and performance using Kubernetes, AWS, and advanced monitoring tools.

About the role

Key Responsibilities

Lead and mentor a high‑performing SRE team, setting vision and strategy for reliability, scalability, and automation across the platform.
Architect and maintain production infrastructure on AWS, leveraging Kubernetes, Terraform, and CI/CD pipelines to ensure rapid, reliable deployments.
Design and implement observability solutions (metrics, logs, traces) to detect, diagnose, and resolve incidents faster, driving a culture of blameless post‑mortems.
Collaborate with product, engineering, and security teams to define SLAs, SLOs, and capacity planning for millions of concurrent mobile users.
Champion continuous improvement initiatives, including chaos engineering, automated testing, and cost‑optimization strategies.

Requirements

10+ years of experience in large‑scale distributed systems, with 5+ years in a leadership role.
Deep expertise in Kubernetes, AWS services (EKS, EC2, RDS, S3), and IaC tools like Terraform.
Proven track record building CI/CD pipelines, monitoring stacks (Prometheus, Grafana, ELK), and incident response frameworks.
Strong communication skills, able to translate technical concepts to non‑technical stakeholders.
Passion for gaming and a user‑centric mindset, with a desire to innovate in a fast‑moving industry.

Skills

kubernetesawscicd

DepartmentEngineering

LocationKarnataka, India

Experience7+ years

Tenurefull-time

LevelLead

Salary250,000

Posted June 26, 2026