onsite
Manager, Site Reliability Engineering - 6sense
Software Engineer
Lead a high‑performing SRE team to design, build, and operate scalable, resilient cloud infrastructure using Kubernetes, AWS, and modern CI/CD pipelines, ensuring uptime, performance, and rapid incident response.
About the role
Key Responsibilities
- Lead and mentor a team of SREs, driving best practices in reliability, automation, and incident management.
- Architect and maintain highly available, scalable Kubernetes clusters on AWS, ensuring secure and efficient deployments.
- Design and implement CI/CD pipelines, monitoring, and alerting systems to support rapid, reliable releases.
- Collaborate with product, engineering, and security teams to define SLAs, SLOs, and capacity planning.
- Own post‑mortem processes, root‑cause analysis, and continuous improvement initiatives.
Requirements
- 5+ years of SRE or DevOps experience in a fast‑moving SaaS environment.
- Deep expertise with Kubernetes, AWS services (EKS, EC2, RDS, CloudWatch), and container orchestration.
- Proven track record building CI/CD pipelines (GitHub Actions, ArgoCD, Jenkins) and observability stacks (Prometheus, Grafana, Loki).
- Strong leadership skills, with experience managing and scaling engineering teams.
- Excellent communication, problem‑solving, and incident response abilities.