onsite
Database SRE Manager AUS
Site Reliability Engineer
Lead a team of Database Site Reliability Engineers, ensuring high availability, performance, and scalability of Apache Cassandra and Kafka deployments across AWS and Azure cloud environments.
About the role
Key Responsibilities
- Lead, mentor, and grow a team of Database SREs responsible for Cassandra and Kafka clusters.
- Design, implement, and maintain highly available, fault‑tolerant database architectures on AWS and Azure.
- Develop automation and infrastructure‑as‑code solutions (e.g., Terraform, Ansible) to streamline provisioning, scaling, and disaster recovery.
- Monitor system health, define SLOs/SLIs, and drive incident response and post‑mortem processes.
- Collaborate with development, security, and product teams to integrate reliability best practices into the software lifecycle.
Requirements
- 5+ years of experience operating large‑scale Cassandra and Kafka deployments in production.
- Strong expertise with AWS and Azure services, including networking, storage, and compute resources.
- Proficiency in Linux system administration and scripting (Bash, Python, or similar).
- Hands‑on experience with infrastructure‑as‑code tools such as Terraform or CloudFormation.
- Demonstrated ability to lead technical teams, drive reliability initiatives, and communicate effectively with stakeholders.
Skills
awsazurelinuxterraform