remote

Site Reliability Engineer - Multi Cloud Kubernetes

Site Reliability Engineer

Site Reliability Engineer responsible for designing and operating a secure, observable multi‑cloud platform on AWS, Azure and GCP using Kubernetes, infrastructure‑as‑code, and AI‑driven automation.

About the role

Key Responsibilities

Architect, build, and maintain a scalable Kubernetes platform spanning AWS, Azure, and GCP.
Implement infrastructure‑as‑code using Terraform and Helm to ensure repeatable, version‑controlled deployments.
Develop automation scripts and tooling in Python to streamline provisioning, configuration, and incident response.
Establish observability pipelines with Prometheus, Grafana, and logging solutions for proactive monitoring and alerting.
Apply zero‑trust security controls, compliance frameworks, and SRE best practices to guarantee reliability and data protection.

Requirements

5+ years of experience in SRE or DevOps roles with deep expertise in Kubernetes and container orchestration.
Hands‑on experience managing workloads across AWS, Azure, and Google Cloud Platform.
Proficiency with Terraform (or similar IaC tools) and Helm for automated infrastructure delivery.
Strong scripting/programming skills in Python and familiarity with CI/CD pipelines.
Experience implementing monitoring, alerting, and zero‑trust security models in production environments.

Skills

kubernetesawsazureterraformpythonprometheushelm

DepartmentEngineering

LocationSouth Riding, Virginia, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 25, 2026