onsite
Principal Site Reliability Engineer - Commonwealth Bank of Australia
Site Reliability Engineer
Lead the design and operation of highly available, scalable banking services, driving SRE best practices across cloud, Kubernetes, and CI/CD pipelines to deliver a world‑class digital banking experience.
About the role
Key Responsibilities
- Architect, build, and maintain resilient, scalable infrastructure for mission‑critical banking applications across multi‑cloud environments.
- Implement and evolve CI/CD pipelines, automated testing, and deployment strategies to accelerate feature delivery while ensuring reliability.
- Define and enforce SLOs, SLIs, and error budgets; monitor system health with advanced observability tools and respond to incidents with rapid root‑cause analysis.
- Collaborate with development, security, and product teams to embed SRE principles into the software development lifecycle.
- Mentor and grow a high‑performing SRE team, fostering a culture of continuous improvement and knowledge sharing.
Requirements
- 10+ years of experience in software engineering and at least 5 years in a senior SRE or DevOps role.
- Deep expertise in Kubernetes, container orchestration, and cloud platforms (AWS, GCP, or Azure).
- Proficiency with CI/CD tooling (GitHub Actions, Jenkins, ArgoCD), infrastructure as code (Terraform, Pulumi), and monitoring (Prometheus, Grafana, Datadog).
- Strong scripting skills in Python or Bash, with a track record of automating complex operational tasks.
- Excellent communication, problem‑solving, and leadership abilities in a fast‑paced, customer‑centric environment.
Skills
kubernetescicdpython