onsite

Senior Site Reliability Engineer - Pave Bank

Site Reliability Engineer

Senior Site Reliability Engineer driving reliability, automation, and performance for cloud-native services using GCP, Kubernetes, Docker, Python, Go, and Grafana.

About the role

Key Responsibilities

Design, implement, and maintain highly available, scalable infrastructure on GCP, ensuring 99.99% uptime for mission‑critical services.
Build and manage Kubernetes clusters, including deployment pipelines, rolling updates, and cluster autoscaling.
Develop automation scripts in Python and Go to streamline operations, monitoring, and incident response.
Configure and maintain observability stack with Grafana, Prometheus, and logging solutions to provide real‑time insights.
Collaborate with development teams to embed SRE best practices into CI/CD pipelines and code reviews.
Lead root‑cause analysis, post‑mortem documentation, and continuous improvement initiatives.

Requirements

5+ years of experience in site reliability engineering or DevOps roles.
Proficiency with GCP services (Compute Engine, Kubernetes Engine, Cloud Storage, Pub/Sub).
Strong scripting skills in Python and Go, with experience building reusable libraries.
Hands‑on experience managing Kubernetes clusters, Helm charts, and container registries.
Deep understanding of monitoring, alerting, and incident management using Grafana, Prometheus, and related tools.

Skills

gcpkubernetesdockerpythongografana

CompanyPave Bank

DepartmentEngineering

LocationKuala Lumpur, Malaysia

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 18, 2026