remote
Manager Site Reliability Engineer - Macquarie Group
Site Reliability Engineer
Lead the Site Reliability Engineering practice for a global credit risk technology team, driving reliability, automation, and scalability using Python, Kubernetes, AWS, Terraform, and CI/CD pipelines.
About the role
Key Responsibilities
- Define and evolve the SRE practice, establishing standards for reliability, incident response, and observability across credit risk applications.
- Design, implement, and maintain highly available, scalable infrastructure on AWS using Kubernetes, Terraform, and automation scripts (Python, Bash).
- Develop and manage CI/CD pipelines to ensure seamless, repeatable deployments and rapid roll‑backs.
- Implement monitoring, alerting, and performance‑tuning solutions; lead post‑mortems and drive continuous improvement.
- Collaborate with full‑stack engineering teams to embed reliability best practices early in the development lifecycle.
Requirements
- 5+ years of experience in site reliability, DevOps, or production engineering within a complex, high‑transaction environment.
- Strong proficiency in Python and Linux system administration.
- Hands‑on experience with Kubernetes orchestration, AWS services, and infrastructure‑as‑code tools such as Terraform.
- Proven track record building CI/CD pipelines and implementing robust monitoring/alerting frameworks.
- Excellent problem‑solving, communication, and leadership skills to mentor engineers and drive cross‑functional initiatives.
Skills
pythonkubernetesawsterraformcicdlinux