remote

Site Reliability Engineer II - Medallia

Site Reliability Engineer

Site Reliability Engineer II responsible for designing, deploying, and maintaining highly available SaaS infrastructure on AWS, leveraging Kubernetes, Docker, and Terraform to ensure performance, reliability, and rapid incident resolution.

About the role

Key Responsibilities

Design, implement, and manage scalable, highly available Kubernetes clusters on AWS to support Medallia’s SaaS platform.
Automate infrastructure provisioning and configuration using Terraform, ensuring repeatable and auditable deployments.
Develop and maintain CI/CD pipelines, integrating automated testing, security scanning, and blue‑green deployments.
Monitor system health with Prometheus, Grafana, and CloudWatch; respond to incidents, conduct post‑mortems, and implement preventive measures.
Collaborate with development teams to optimize application performance, capacity planning, and cost efficiency.

Requirements

3+ years of SRE or DevOps experience in a cloud‑native environment.
Proficiency with Kubernetes, Docker, and AWS services (EKS, EC2, RDS, S3).
Hands‑on experience with Terraform, Helm, and CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
Strong scripting skills in Bash or Python and familiarity with monitoring/alerting tools.
Excellent problem‑solving, communication, and collaboration abilities.

Skills

kubernetesdockerawsterraform

CompanyMedallia

DepartmentEngineering

LocationMcLean, VA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary150,000

Posted June 19, 2026