remote

Sr Site Reliability Engineer - Cloud, AI Operations, SaaS, PaaS - Optum

Site Reliability Engineer

Senior Site Reliability Engineer focused on cloud‑native AI operations for SaaS and PaaS platforms, driving reliability, automation, and performance at scale.

About the role

Key Responsibilities

Design, implement, and maintain highly available cloud infrastructure for AI‑driven SaaS and PaaS services.
Develop and manage CI/CD pipelines, ensuring rapid, reliable deployments across multi‑cloud environments.
Implement observability, monitoring, and alerting solutions to detect and remediate incidents proactively.
Collaborate with development, security, and product teams to embed SRE best practices into the software lifecycle.
Lead incident response, root‑cause analysis, and post‑mortem documentation to continuously improve system resilience.

Requirements

5+ years of experience in site reliability engineering or related roles.
Proficiency with Kubernetes, Docker, and cloud platforms (AWS, Azure, or GCP).
Strong scripting skills in Python and experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
Hands‑on experience with monitoring/observability stacks (Prometheus, Grafana, ELK, or similar).
Excellent problem‑solving skills and a proactive, collaborative mindset.

Skills

kubernetescicdpython

CompanyOptum

DepartmentOperations

LocationTamil Nadu, India

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 19, 2026