remote

Senior Site Reliability Engineer - Cvent

Site Reliability Engineer

Senior Site Reliability Engineer responsible for designing, automating, and operating highly available cloud infrastructure, leveraging Kubernetes, AWS, Terraform, and modern monitoring tools to ensure reliability and performance at scale.

About the role

Key Responsibilities

Design, build, and maintain scalable, highly available services on AWS using infrastructure‑as‑code (Terraform) and container orchestration (Kubernetes).
Develop automation scripts and tools in Python and Go to streamline deployment, configuration, and incident response workflows.
Implement robust monitoring, alerting, and observability solutions with Prometheus, Grafana, and logging pipelines to proactively detect and resolve issues.
Collaborate with development and product teams to define SLOs/SLIs, conduct capacity planning, and drive performance optimizations.
Lead on‑call rotations, perform root‑cause analysis, and create post‑mortem documentation to continuously improve reliability.

Requirements

5+ years of experience in site reliability or DevOps engineering, with a strong focus on cloud platforms (AWS) and container orchestration (Kubernetes).
Proficiency in scripting/programming languages such as Python and Go.
Hands‑on experience with infrastructure‑as‑code tools (Terraform, CloudFormation) and CI/CD pipelines (Jenkins, GitLab CI, or similar).
Deep understanding of Linux systems, networking, and performance tuning.
Experience with monitoring and observability stacks (Prometheus, Grafana, ELK/EFK) and a track record of implementing SLO/SLI frameworks.

Skills

pythongokubernetesawsterraformprometheuscicdlinux

CompanyCvent

DepartmentEngineering

LocationTysons Corner, Virginia, United States

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 27, 2026