remote

Senior Site Reliability Engineer - DigiCert

Site Reliability Engineer

Senior SRE who drives reliability, scalability, and performance for cloud‑native services, leveraging Kubernetes, AWS, Terraform, and automation with Python and CI/CD pipelines.

About the role

Key Responsibilities

Design, implement, and maintain highly available, scalable infrastructure on AWS using Kubernetes, Docker, and Terraform.
Develop and own monitoring, alerting, and observability solutions with Prometheus, Grafana, and custom Python scripts.
Collaborate with development teams to embed reliability best practices into CI/CD pipelines and application code.
Automate routine operational tasks, incident response, and post‑mortem analysis to continuously improve system resilience.
Lead capacity planning, performance tuning, and disaster‑recovery testing for mission‑critical services.

Requirements

5+ years of experience in site reliability or DevOps roles, with deep expertise in Kubernetes, Docker, and AWS services.
Proficiency in infrastructure‑as‑code tools such as Terraform and strong scripting skills in Python or Bash.
Hands‑on experience with monitoring stacks (Prometheus, Grafana) and building robust CI/CD pipelines (Jenkins, GitLab CI, or similar).
Solid understanding of Linux systems, networking, and security best practices.
Track record of driving reliability improvements, incident management, and automation at scale.

Skills

kubernetesdockerawsterraformpythonprometheuscicdlinux

CompanyDigiCert

DepartmentEngineering

LocationLehi, Utah, United States

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 22, 2026