remote
Senior Site Reliability Engineer SRE Kubernetes Platform IRAP - Cisco
Site Reliability Engineer
Senior Site Reliability Engineer focused on building and operating a Kubernetes‑based platform that delivers reliable, scalable, and secure cloud services, while enabling rapid developer velocity through robust observability, automation, and compliance practices.
About the role
Key Responsibilities
- Design, implement, and maintain a highly available Kubernetes platform across multiple environments.
- Collaborate with cross‑functional teams to ensure platform reliability, observability, and security compliance.
- Develop and refine CI/CD pipelines, automation scripts, and monitoring solutions to accelerate deployment and reduce toil.
- Lead incident response, root cause analysis, and post‑mortem activities to continuously improve system resilience.
- Advise product teams on best practices for cloud architecture, cost optimization, and operational excellence.
Requirements
- 5+ years of experience in Site Reliability Engineering or DevOps roles.
- Deep expertise with Kubernetes, container orchestration, and cloud infrastructure.
- Proficiency in observability tools (Prometheus, Grafana, ELK) and incident management.
- Strong scripting skills (Python, Bash) and experience with CI/CD tooling (GitHub Actions, ArgoCD).
- Solid understanding of security best practices, compliance frameworks, and automation.