remote

Principal Site Reliability Engineer - Saviynt

Site Reliability Engineer

Lead the design, build, and operation of scalable, reliable infrastructure services for a global SaaS platform, driving automation, observability, and incident response excellence using Kubernetes, AWS, Terraform, and advanced CI/CD practices.

About the role

Key Responsibilities

Architect and maintain shared infrastructure services that support product and application teams across a mission‑critical SaaS platform.
Design and implement scalable, highly available Kubernetes clusters and associated tooling on AWS.
Develop and enforce IaC standards using Terraform, ensuring repeatable, auditable deployments.
Lead incident response, root‑cause analysis, and post‑mortem processes to continuously improve reliability.
Collaborate with security and compliance teams to embed best practices into all infrastructure components.
Mentor and guide SRE and DevOps teams, fostering a culture of automation, observability, and proactive problem‑solving.

Requirements

10+ years of experience in site reliability engineering or related roles, with a strong focus on cloud-native environments.
Deep expertise in Kubernetes, AWS services (EKS, EC2, S3, CloudWatch), and Terraform.
Proven track record of building robust CI/CD pipelines and implementing comprehensive monitoring/alerting solutions.
Strong incident‑management skills and experience with post‑mortem documentation.
Excellent communication and leadership abilities, with a passion for mentoring junior engineers.

Skills

kubernetesawsterraformcicd

CompanySaviynt

DepartmentEngineering

LocationVancouver, CA, United States

Experience7+ years

Tenurefull-time

LevelLead

Posted June 19, 2026