remote

Site Reliability Engineer SRE - Apple

Site Reliability Engineer

Site Reliability Engineer focused on building and automating highly available, privacy‑preserving cloud services using Kubernetes, Terraform, Go, and Python, while driving observability and continuous delivery pipelines.

About the role

Key Responsibilities

Design, implement, and operate highly available services that power private cloud intelligence while maintaining strict user‑privacy guarantees.
Automate provisioning, configuration, and scaling of infrastructure using Terraform and Kubernetes operators.
Develop and maintain monitoring, alerting, and performance dashboards with Prometheus, Grafana, and custom tooling.
Write production‑grade code in Go and Python to improve reliability, self‑healing, and automation of critical workflows.
Collaborate with development, security, and product teams to define SLOs/SLA targets and drive incident response and post‑mortem processes.

Requirements

3+ years of experience in site reliability or production engineering on Linux‑based platforms.
Strong proficiency with Kubernetes orchestration, Terraform IaC, and containerized workloads.
Hands‑on programming experience in Go and Python for tooling and automation.
Deep understanding of monitoring, alerting, and observability stacks (Prometheus, Grafana, logging pipelines).
Experience building CI/CD pipelines and implementing best practices for automated testing and deployment.

Skills

linuxkubernetesterraformgopythonprometheuscicd

CompanyApple

DepartmentEngineering

LocationLondon, ENG, United Kingdom

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 20, 2026