remote

Staff Site Reliability Engineer - Infra - Okta

Site Reliability Engineer

Lead the design, build, and operation of highly scalable, secure infrastructure on AWS and GCP, driving reliability, automation, and incident response for production systems.

About the role

Key Responsibilities

Design, build, and operate highly scalable, reliable, and secure infrastructure powering production systems across AWS and GCP.
Lead major reliability initiatives, including capacity planning, performance tuning, and cost optimization.
Implement and maintain CI/CD pipelines, infrastructure as code (Terraform), and automated testing to accelerate delivery.
Develop and enforce monitoring, alerting, and incident response processes to ensure 99.99% uptime.
Collaborate with cross‑functional teams to define SLAs, SLOs, and error budgets.

Requirements

5+ years of SRE or DevOps experience in large, distributed environments.
Deep expertise with AWS and GCP services, Kubernetes, and Terraform.
Strong scripting skills (Python, Bash) and experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
Proven track record of building observability solutions (Prometheus, Grafana, ELK) and incident management.
Excellent communication, problem‑solving, and collaboration skills.

Skills

awsgcpkubernetesterraformcicd

CompanyOkta

DepartmentEngineering

LocationKarnataka, India

Experience5+ years

Tenurefull-time

LevelLead

Posted June 19, 2026