remote
Senior Site Reliability Engineer - Optum
Site Reliability Engineer
Senior Site Reliability Engineer responsible for designing, deploying, and maintaining scalable cloud infrastructure on AWS, leveraging Kubernetes, Docker, Terraform, and CI/CD pipelines to ensure high availability and performance of critical health services.
About the role
Key Responsibilities
- Architect and manage highly available, scalable cloud environments on AWS for Optum Serve’s services.
- Design, implement, and maintain Kubernetes clusters, Docker containers, and related CI/CD pipelines.
- Automate infrastructure provisioning and configuration using Terraform and other IaC tools.
- Monitor system performance, troubleshoot incidents, and implement proactive reliability improvements.
- Collaborate with development teams to integrate best practices for security, observability, and cost optimization.
Requirements
- 5+ years of experience in site reliability or DevOps roles.
- Proficiency with AWS services (EC2, EKS, RDS, S3, CloudWatch).
- Strong scripting skills in Python or Go and experience with Docker and Kubernetes.
- Hands‑on experience with Terraform, Helm, and CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
- Excellent problem‑solving skills and a passion for automation and continuous improvement.
Skills
kubernetesdockerawsterraformcicdpythongo