remote

Site Reliability Engineer, Senior - Booz Allen Hamilton

Site Reliability Engineer

Senior Site Reliability Engineer focused on building resilient, automated infrastructure for the Intelligence Community using Kubernetes, Prometheus, Grafana, AWS, and Terraform to reduce toil and improve system reliability.

About the role

Key Responsibilities

Design, deploy, and maintain highly available Kubernetes clusters and associated services.
Implement comprehensive monitoring with Prometheus and Grafana, creating alerts and dashboards for critical metrics.
Automate infrastructure provisioning and configuration using Terraform and AWS CloudFormation.
Develop and maintain Bash and Python scripts to reduce manual toil and enable self‑repair mechanisms.
Collaborate with development teams to embed SRE best practices into CI/CD pipelines and application deployments.

Requirements

5+ years of experience in site reliability, DevOps, or systems engineering roles.
Proficiency with Kubernetes, Docker, and container orchestration at scale.
Strong scripting skills in Bash and Python, with a track record of automating routine tasks.
Hands‑on experience with Prometheus, Grafana, and alerting systems.
Experience deploying and managing workloads on AWS, including EC2, EKS, and related services.

Skills

pythonbashkubernetesprometheusgrafanaawsterraform

CompanyBooz Allen Hamilton

DepartmentEngineering

LocationAurora, CO, United States

Experience5+ years

Tenurefull-time

LevelSenior

Salary198,000

Posted June 19, 2026