onsite

Site Reliability Engineer - Manchester - NS West 1 - BAE Systems

Site Reliability Engineer

Senior Site Reliability Engineer responsible for designing, deploying, and maintaining highly available cloud infrastructure, automating CI/CD pipelines, and ensuring robust monitoring and incident response across Kubernetes clusters using AWS services.

About the role

Key Responsibilities

Design, implement, and manage scalable Kubernetes clusters on AWS, ensuring high availability and performance.
Develop and maintain CI/CD pipelines with GitHub Actions, Terraform, and Helm for automated application delivery.
Implement comprehensive monitoring, logging, and alerting using Prometheus, Grafana, and ELK stack.
Lead incident response, root cause analysis, and post‑mortem documentation to improve system reliability.
Collaborate with development teams to enforce best practices in code quality, security, and infrastructure as code.

Requirements

5+ years of experience in site reliability or DevOps roles.
Proficient with Kubernetes, Docker, and cloud platforms (AWS preferred).
Strong scripting skills in Python or Bash and experience with Terraform or CloudFormation.
Hands‑on experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD) and monitoring solutions.
Excellent problem‑solving skills and a proactive approach to automation and reliability.

Skills

kubernetesdockercicdawspython

CompanyBAE Systems

DepartmentEngineering

LocationGloucester, ENG, United Kingdom

Experience7+ years

Tenurefull-time

LevelMid-Level

Posted June 19, 2026