remote

Senior Site Reliability Engineer - synthesia

Site Reliability Engineer

Lead the design, deployment, and operation of scalable, highly available services on Kubernetes and AWS, ensuring reliability, performance, and rapid incident response for a global AI video platform.

About the role

Key Responsibilities

Architect and maintain highly available, scalable infrastructure on Kubernetes and AWS, ensuring 99.99% uptime for mission‑critical services.
Design and implement CI/CD pipelines, automated testing, and blue‑green deployments to accelerate feature delivery while minimizing risk.
Monitor system health with Prometheus, Grafana, and custom alerts; conduct post‑mortem analysis and drive continuous improvement.
Collaborate with development, security, and product teams to embed observability, resilience, and cost‑efficiency into every release.
Lead incident response, root‑cause analysis, and knowledge‑sharing sessions to elevate team expertise.

Requirements

5+ years of SRE or DevOps experience in a high‑scale, cloud‑native environment.
Proficiency with Kubernetes, Docker, Helm, and Terraform for infrastructure as code.
Strong scripting skills (Python, Bash) and experience with CI/CD tools (GitHub Actions, ArgoCD, Jenkins).
Hands‑on experience with AWS services (EKS, EC2, S3, CloudWatch) and monitoring tools (Prometheus, Grafana).
Excellent problem‑solving, communication, and collaboration skills in a fast‑moving, cross‑functional team.

Skills

kubernetesdockercicdawsprometheusgrafanaterraform

Companysynthesia

DepartmentEngineering

LocationUnited States

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 22, 2026