remote
Site Reliability Engineer - Asylon
Site Reliability Engineer
We seek a Site Reliability Engineer to design, automate, and maintain highly available cloud infrastructure, focusing on Kubernetes, CI/CD pipelines, and observability for AI‑driven security robotics.
About the role
Key Responsibilities
- Design, implement, and operate scalable Kubernetes clusters on AWS to support AI‑powered security robotics.
- Develop and maintain infrastructure‑as‑code using Terraform and automate deployment pipelines with Python/Go scripts.
- Implement robust monitoring, logging, and alerting solutions to ensure high availability and rapid incident response.
- Collaborate with development and security teams to embed reliability best practices into the software lifecycle.
- Participate in on‑call rotation, perform root‑cause analysis, and drive continuous improvement of system performance.
Requirements
- 3+ years of experience in site reliability or DevOps roles, preferably in cloud‑native environments.
- Strong proficiency in Python and Go for automation and tooling.
- Hands‑on experience with Kubernetes orchestration, AWS services, and Terraform.
- Solid understanding of CI/CD concepts, monitoring (Prometheus, Grafana), and logging (ELK/EFK stacks).
- Excellent problem‑solving skills and ability to work in fast‑paced, cross‑functional teams.
Skills
pythongokubernetesawsterraform