onsite
Site Reliability Engineer - Axon
Site Reliability Engineer
Site Reliability Engineer responsible for building and operating highly available, scalable cloud services, automating infrastructure, and improving reliability through monitoring, incident response, and performance optimization.
About the role
Key Responsibilities
- Design, implement, and maintain highly available services on AWS using Kubernetes, Terraform, and CI/CD pipelines.
- Develop automation scripts and tools in Python and Go to streamline operations and reduce manual toil.
- Monitor system health, set up alerts, and conduct root‑cause analysis for incidents to improve reliability.
- Collaborate with development teams to embed reliability best practices into the software development lifecycle.
- Participate in on‑call rotations, incident response, and post‑mortem reviews to drive continuous improvement.
Requirements
- 3+ years of experience in site reliability or DevOps engineering, preferably in a cloud‑native environment.
- Strong proficiency with AWS services, Kubernetes orchestration, and infrastructure‑as‑code tools such as Terraform.
- Hands‑on programming experience in Python and Go for automation and tooling.
- Experience building CI/CD pipelines and implementing monitoring, logging, and alerting solutions.
- Solid understanding of networking, security, and performance tuning in distributed systems.
Skills
pythongokubernetesawsterraformcicd