remote

Senior Site Reliability Engineer - BillingPlatform

Site Reliability Engineer

Lead the reliability and scalability of a cloud‑native SaaS platform, driving automation, performance, and uptime using Kubernetes, AWS, Terraform, and advanced monitoring tools.

About the role

Key Responsibilities

Design, implement, and maintain highly available, scalable infrastructure for a global SaaS revenue lifecycle platform on AWS.
Automate deployment pipelines with CI/CD, Terraform, and GitOps practices to ensure rapid, reliable releases.
Implement and manage Kubernetes clusters, ensuring optimal resource utilization, security, and resilience.
Develop and maintain observability stack (Prometheus, Grafana, Loki, etc.) for real‑time monitoring, alerting, and capacity planning.
Lead incident response, root‑cause analysis, and post‑mortem processes to continuously improve system reliability.
Collaborate with development, security, and product teams to embed SRE principles across the organization.

Requirements

5+ years of experience in site reliability or DevOps roles, preferably in SaaS environments.
Deep expertise with AWS services (EC2, RDS, S3, EKS, CloudWatch) and Kubernetes cluster management.
Proficient in infrastructure as code using Terraform and automation with CI/CD pipelines.
Strong scripting skills in Python or Bash for automation and tooling.
Experience with monitoring, logging, and alerting solutions (Prometheus, Grafana, Loki, ELK).

Skills

kubernetesawsterraformcicd

CompanyBillingPlatform

DepartmentEngineering

LocationEnglewood, Colorado, United States

Experience5+ years

Tenurefull-time

LevelSenior

Salary195,000

Posted June 22, 2026