onsite
Senior Site Reliability Engineer - RBC
Site Reliability Engineer
Senior Site Reliability Engineer responsible for designing, deploying, and maintaining highly available insurance technology platforms using Kubernetes, Docker, and CI/CD pipelines, while ensuring robust monitoring, incident response, and continuous improvement of cloud-based services.
About the role
Key Responsibilities
- Design, implement, and manage scalable, resilient infrastructure for insurance technology applications using Kubernetes and Docker.
- Develop and maintain CI/CD pipelines to automate build, test, and deployment processes across cloud environments.
- Implement comprehensive monitoring, alerting, and logging solutions (Prometheus, Grafana, ELK) to ensure high availability and performance.
- Lead incident response, root cause analysis, and post‑mortem documentation to drive continuous reliability improvements.
- Collaborate with development, security, and product teams to embed SRE best practices into the software development lifecycle.
Requirements
- 5+ years of experience in site reliability engineering or related roles.
- Proficiency with Kubernetes, Docker, and cloud platforms (AWS, Azure, or GCP).
- Strong scripting skills in Python and Bash for automation.
- Hands‑on experience with CI/CD tools (Jenkins, GitLab CI, ArgoCD) and monitoring stacks.
- Excellent problem‑solving skills and a proactive approach to improving system reliability.
Skills
kubernetesdockercicdpython