remote

Senior Site Reliability Engineer - Red Hat

Site Reliability Engineer

Lead the design, deployment, and operation of a multi‑tenant Kubernetes platform on AWS, blending software engineering with production reliability to deliver a managed OpenShift service.

About the role

Key Responsibilities

Architect, build, and maintain ROSA HCP control plane infrastructure on AWS, ensuring high availability, scalability, and security.
Write production‑grade code in Go (and Python) to extend platform capabilities, contribute to upstream OpenShift and Kubernetes projects, and automate operational tasks.
Own end‑to‑end reliability: design observability, monitoring, alerting, and incident response processes; participate in on‑call rotations.
Implement and evolve IaC with Terraform, GitOps workflows, and CI/CD pipelines to streamline releases and rollbacks.
Collaborate with cross‑functional teams (product, security, support) to define SLAs, capacity planning, and cost optimization strategies.

Requirements

5+ years of SRE or DevOps experience in cloud‑native environments, with deep knowledge of Kubernetes and OpenShift.
Proficient in AWS services (EKS, VPC, IAM, CloudWatch) and infrastructure automation tools.
Strong coding skills in Go (or similar) and experience with CI/CD, GitOps, and Terraform.
Hands‑on experience with monitoring/observability stacks (Prometheus, Grafana, Loki) and incident management.
Excellent communication, problem‑solving, and collaboration abilities in a fast‑paced, distributed team.

Skills

kubernetesawsgoterraform

CompanyRed Hat

DepartmentEngineering

LocationRaleigh, North Carolina, United States

Experience5+ years

Tenurefull-time

LevelSenior

Salary195,680

Posted June 22, 2026