onsite

Site Reliability Engineer - Manchester - BAE Systems

Site Reliability Engineer

Senior Site Reliability Engineer responsible for designing, deploying, and maintaining highly available cloud-native services using Kubernetes, Docker, and AWS, while ensuring robust monitoring, alerting, and automation across production environments.

About the role

Key Responsibilities

Design, implement, and manage scalable Kubernetes clusters and Docker-based microservices across AWS environments.
Develop and maintain CI/CD pipelines using Git, Jenkins, and Terraform to automate deployments and infrastructure provisioning.
Implement comprehensive monitoring, logging, and alerting with Prometheus, Grafana, and ELK stack to ensure high availability and performance.
Collaborate with development teams to enforce best practices for code quality, security, and observability.
Conduct root cause analysis, post‑incident reviews, and continuous improvement initiatives to reduce MTTR and prevent recurrence.

Requirements

5+ years of experience in site reliability or DevOps roles within cloud-native environments.
Proficiency with Kubernetes, Docker, and AWS services (EKS, EC2, S3, CloudWatch).
Hands‑on experience building CI/CD pipelines and IaC with Terraform or CloudFormation.
Strong scripting skills in Python or Bash and familiarity with monitoring tools such as Prometheus and Grafana.
Excellent problem‑solving abilities, strong communication skills, and a proactive, collaborative mindset.

Skills

kubernetesdockercicdawsprometheusgrafanaterraform

CompanyBAE Systems

DepartmentEngineering

LocationGloucester, ENG, United Kingdom

Experience7+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026