onsite

Senior Site Reliability Engineer - NewDay

Site Reliability Engineer

Senior Site Reliability Engineer driving reliability, performance, and automation across a cloud-native platform using Python, Kubernetes, Docker, and AWS. Lead initiatives to eliminate toil, implement observability, and shape modern SRE practices.

About the role

Key Responsibilities

Design, build, and maintain highly available, scalable services on Kubernetes and AWS.
Automate deployment pipelines, configuration management, and incident response using Terraform, CI/CD, and scripting.
Implement observability stack (Prometheus, Grafana, Loki) to monitor performance, detect anomalies, and drive proactive improvements.
Collaborate with development teams to embed reliability best practices into code reviews and release processes.
Lead post‑mortem analysis, root‑cause investigations, and continuous improvement initiatives.

Requirements

5+ years of experience in site reliability or DevOps roles.
Strong proficiency in Python and/or Go for automation and tooling.
Hands‑on experience with Kubernetes, Docker, and cloud infrastructure (AWS).
Expertise in IaC (Terraform) and CI/CD pipelines.
Deep understanding of monitoring, alerting, and incident management.

Skills

pythonkubernetesdockerawsterraformprometheusgrafana

CompanyNewDay

DepartmentEngineering

LocationLondon, ENG, United Kingdom

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 19, 2026