remote

Site Reliability Engineer - Booz Allen Hamilton

Site Reliability Engineer

Site Reliability Engineer focused on building resilient, automated cloud infrastructure for the Intelligence Community, leveraging monitoring, redundancy, and scripting to reduce toil and improve system reliability.

About the role

Key Responsibilities

Design, implement, and maintain highly available, scalable cloud infrastructure to support mission-critical applications.
Develop and deploy automation scripts to reduce manual toil and enable self‑repair capabilities.
Implement comprehensive monitoring, alerting, and logging solutions to detect and resolve incidents proactively.
Collaborate with development and operations teams to embed reliability best practices into the software delivery lifecycle.
Conduct post‑incident reviews, root‑cause analysis, and continuous improvement initiatives.

Requirements

Proven experience in Site Reliability Engineering or related roles with a strong focus on cloud platforms.
Hands‑on expertise in automation tools (e.g., Terraform, Ansible, Python scripting).
Deep knowledge of monitoring and observability stacks (e.g., Prometheus, Grafana, ELK).
Strong understanding of networking, security, and high‑availability design principles.
Excellent problem‑solving skills and a proactive, collaborative mindset.

Skills

awskubernetesdockerlinuxjenkinsjiraconfluence

CompanyBooz Allen Hamilton

DepartmentEngineering

LocationChantilly, VA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary141,000

Posted June 19, 2026