onsite
Senior Site Reliability Engineer - Pave Bank
Site Reliability Engineer
Senior Site Reliability Engineer driving reliability, automation, and performance for cloud-native services using GCP, Kubernetes, Docker, Python, Go, and Grafana.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on GCP, ensuring 99.99% uptime for mission‑critical services.
- Build and manage Kubernetes clusters, including deployment pipelines, rolling updates, and cluster autoscaling.
- Develop automation scripts in Python and Go to streamline operations, monitoring, and incident response.
- Configure and maintain observability stack with Grafana, Prometheus, and logging solutions to provide real‑time insights.
- Collaborate with development teams to embed SRE best practices into CI/CD pipelines and code reviews.
- Lead root‑cause analysis, post‑mortem documentation, and continuous improvement initiatives.
Requirements
- 5+ years of experience in site reliability engineering or DevOps roles.
- Proficiency with GCP services (Compute Engine, Kubernetes Engine, Cloud Storage, Pub/Sub).
- Strong scripting skills in Python and Go, with experience building reusable libraries.
- Hands‑on experience managing Kubernetes clusters, Helm charts, and container registries.
- Deep understanding of monitoring, alerting, and incident management using Grafana, Prometheus, and related tools.
Skills
gcpkubernetesdockerpythongografana