remote
Site Reliability Engineer Covert Capability - Security Service (MI5)
Site Reliability Engineer
Senior Site Reliability Engineer responsible for designing, deploying, and maintaining secure, highly available infrastructure for covert operations, leveraging Kubernetes, Docker, CI/CD pipelines, AWS, and advanced monitoring to ensure resilience and rapid incident response.
About the role
Key Responsibilities
- Design, implement, and manage secure, highly available infrastructure for covert operations using Kubernetes, Docker, and AWS services.
- Develop and maintain CI/CD pipelines to automate application delivery and infrastructure provisioning.
- Implement robust monitoring, alerting, and logging solutions to detect and respond to incidents in real time.
- Collaborate with development and security teams to enforce best practices, conduct code reviews, and perform security hardening.
- Lead incident investigations, root cause analysis, and post‑mortem documentation to continuously improve reliability.
Requirements
- Proven experience as a Site Reliability Engineer or similar role in a high‑security environment.
- Strong proficiency with Kubernetes, Docker, and AWS (EC2, EKS, S3, CloudWatch).
- Hands‑on expertise in CI/CD tooling (GitLab CI, Jenkins, ArgoCD) and configuration management (Ansible, Terraform).
- Deep knowledge of Linux system administration, networking, and security hardening.
- Excellent problem‑solving skills, ability to work under pressure, and strong communication abilities.
Skills
kubernetesdockercicdawslinux