remote

Staff Site Reliability Engineer - Okta

Site Reliability Engineer

Lead the design, build, and operation of highly scalable, secure infrastructure on AWS and GCP, driving reliability, performance, and automation for production systems.

About the role

Key Responsibilities

Design, build, and operate highly scalable, reliable, and secure infrastructure powering production systems across AWS and GCP.
Lead major reliability initiatives, including capacity planning, performance tuning, and cost optimization.
Implement and maintain CI/CD pipelines, automated testing, and deployment workflows to accelerate feature delivery.
Develop and enforce observability practices—metrics, logging, tracing—to detect, diagnose, and resolve incidents quickly.
Collaborate with cross‑functional teams to define SLAs, SLOs, and incident response procedures.

Requirements

5+ years of experience in Site Reliability Engineering or DevOps roles.
Deep expertise with AWS and GCP services (EC2, ECS, EKS, GKE, Cloud Run, Cloud Functions).
Proficiency in Kubernetes, container orchestration, and infrastructure as code (Terraform, CloudFormation).
Strong scripting skills (Python, Bash) and experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
Hands‑on experience with monitoring, alerting, and incident management tools (Prometheus, Grafana, PagerDuty, Datadog).

Skills

awsgcpkubernetescicd

CompanyOkta

DepartmentEngineering

LocationKarnataka, India

Experience5+ years

Tenurefull-time

LevelLead

Posted June 19, 2026