remote
Database SRE Manager AUS - CrowdStrike
Site Reliability Engineer
Lead a high‑performing Database SRE team, driving reliability, automation, and performance for mission‑critical data platforms using cloud, container, and infrastructure‑as‑code technologies.
About the role
Key Responsibilities
- Own the end‑to‑end reliability and performance of production database services (PostgreSQL, MySQL) across multi‑cloud environments.
- Build and maintain automation pipelines with Terraform, Python, and CI/CD tools to provision, patch, and scale database infrastructure.
- Implement observability solutions using Prometheus, Grafana, and logging stacks to detect, diagnose, and resolve incidents quickly.
- Lead on‑call rotations, incident response, and post‑mortem processes, fostering a culture of continuous improvement.
- Collaborate with engineering, security, and product teams to define SLAs, capacity planning, and disaster‑recovery strategies.
- Mentor and grow a team of SRE engineers, establishing best practices for code review, testing, and documentation.
Requirements
- 5+ years of production database operations experience, with deep expertise in PostgreSQL and MySQL.
- Strong background in cloud platforms (AWS) and container orchestration (Kubernetes).
- Proficiency in infrastructure‑as‑code (Terraform) and scripting/automation (Python, Bash).
- Hands‑on experience with monitoring, alerting, and logging tools such as Prometheus, Grafana, or similar.
- Demonstrated leadership ability to manage and develop high‑performing SRE teams and drive incident‑response processes.
Skills
postgresqlmysqlawskubernetesterraformpythonprometheus