remote

Site Reliability Engineer - Optum

Site Reliability Engineer

Drive the design, deployment, and operation of scalable cloud infrastructure for Optum Serve, leveraging Kubernetes, Docker, AWS, and Terraform to ensure high availability, performance, and security across commercial and government workloads.

About the role

Key Responsibilities

Architect, implement, and maintain highly available Kubernetes clusters on AWS, ensuring seamless deployment of containerized services.
Develop and manage Terraform modules for infrastructure as code, automating provisioning and lifecycle management.
Implement robust monitoring, logging, and alerting solutions using Prometheus, Grafana, and CloudWatch to detect and remediate incidents proactively.
Collaborate with development teams to integrate CI/CD pipelines, enforce best practices, and streamline release processes.
Conduct capacity planning, performance tuning, and cost optimization across cloud resources.
Respond to on‑call incidents, perform root cause analysis, and drive post‑mortem improvements.

Requirements

3+ years of experience in site reliability engineering or DevOps roles.
Hands‑on expertise with Kubernetes, Docker, and AWS services (EKS, EC2, S3, RDS).
Proficiency in Terraform and configuration management tools.
Strong scripting skills in Bash or Python for automation.
Excellent problem‑solving abilities and a collaborative mindset.

Skills

kubernetesdockerawsterraform

CompanyOptum

DepartmentEngineering

LocationEden Prairie, Minnesota, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary130,000

Posted June 23, 2026