remote
Site Reliability Engineer SRE - Apple
Site Reliability Engineer
Site Reliability Engineer focused on building and automating highly available, privacy‑preserving cloud services using Kubernetes, Terraform, Go, and Python, while driving observability and continuous delivery pipelines.
About the role
Key Responsibilities
- Design, implement, and operate highly available services that power private cloud intelligence while maintaining strict user‑privacy guarantees.
- Automate provisioning, configuration, and scaling of infrastructure using Terraform and Kubernetes operators.
- Develop and maintain monitoring, alerting, and performance dashboards with Prometheus, Grafana, and custom tooling.
- Write production‑grade code in Go and Python to improve reliability, self‑healing, and automation of critical workflows.
- Collaborate with development, security, and product teams to define SLOs/SLA targets and drive incident response and post‑mortem processes.
Requirements
- 3+ years of experience in site reliability or production engineering on Linux‑based platforms.
- Strong proficiency with Kubernetes orchestration, Terraform IaC, and containerized workloads.
- Hands‑on programming experience in Go and Python for tooling and automation.
- Deep understanding of monitoring, alerting, and observability stacks (Prometheus, Grafana, logging pipelines).
- Experience building CI/CD pipelines and implementing best practices for automated testing and deployment.
Skills
linuxkubernetesterraformgopythonprometheuscicd