remote
Site Reliability Engineer - Gammastack
Site Reliability Engineer
Drive reliability and performance of microservices on GCP, focusing on Cloud Run, Kubernetes, and CI/CD pipelines, while building observability and incident response practices to ensure high availability and rapid issue resolution.
About the role
Key Responsibilities
- Own the reliability, availability, and performance of microservices and production workloads.
- Design and enhance resilient infrastructure on GCP, with a strong emphasis on Cloud Run, Kubernetes, and containerized services.
- Build and maintain observability across logs, metrics, tracing, alerting, and service health to detect and resolve issues early.
- Improve deployment safety through stronger CI/CD pipelines, release controls, rollback strategies, and environment consistency.
- Lead incident response and production readiness practices, including runbooks, post‑mortems, on‑call hygiene, capacity planning, and resilience testing.
Requirements
- Proven experience with GCP, Cloud Run, and Kubernetes in a production environment.
- Strong background in CI/CD tooling and pipeline automation.
- Hands‑on expertise in observability tools (logs, metrics, tracing, alerting).
- Experience leading incident response, runbook creation, and post‑mortem analysis.
- Excellent communication skills and a collaborative mindset.