remote

Staff Site Reliability Operations Engineer - Calix

Systems Engineer

Lead the design, implementation, and operation of scalable, highly available cloud services on AWS using Kubernetes, Terraform, and Python, driving automation, reliability, and performance for a cloud‑first, AI‑powered platform.

About the role

Key Responsibilities

Architect and maintain highly available, scalable Kubernetes clusters on AWS, ensuring zero downtime and optimal resource utilization.
Develop and manage IaC pipelines with Terraform, automating infrastructure provisioning and configuration across multiple environments.
Implement robust CI/CD workflows, integrating automated testing, security scanning, and blue‑green deployments for rapid, reliable releases.
Design and maintain observability stack (Prometheus, Grafana, Loki, etc.) to provide real‑time metrics, logs, and alerts, driving proactive incident response.
Collaborate with development, security, and product teams to define SLOs, SLIs, and incident management processes, fostering a culture of reliability.

Requirements

5+ years of experience in Site Reliability Engineering or DevOps roles, with deep expertise in Kubernetes and AWS.
Proficient in Terraform, Python, and CI/CD tooling (GitHub Actions, ArgoCD, Jenkins).
Strong background in monitoring, logging, and alerting solutions (Prometheus, Grafana, Loki, ELK).
Excellent problem‑solving skills and a proactive approach to automation and process improvement.
Effective communication skills and ability to work cross‑functionally in a fast‑paced environment.

Skills

kubernetesawsterraformpythoncicd

CompanyCalix

DepartmentOperations

LocationUnited States

Experience7+ years

Tenurefull-time

LevelLead

Salary265,700

Posted June 19, 2026