onsite
Senior Site Reliability Engineer Lead - Akamai
Site Reliability Engineer
Lead a high‑performing SRE team to design, build, and operate resilient Compute services, driving automation, observability, and incident excellence across cloud platforms.
About the role
Key Responsibilities
- Lead the design, implementation, and continuous improvement of highly available Compute services and supporting infrastructure.
- Mentor and coach a team of SREs, fostering a culture of ownership, collaboration, and rapid problem resolution.
- Architect and maintain CI/CD pipelines, container orchestration (Kubernetes), and deployment automation to accelerate feature delivery.
- Define and enforce observability standards using Prometheus, Grafana, and distributed tracing, ensuring proactive monitoring and alerting.
- Own incident response processes, conduct blameless post‑mortems, and drive root‑cause analysis to prevent recurrence.
- Collaborate with product, security, and platform teams to align reliability goals with business objectives.
Requirements
- 8+ years of experience in site reliability or DevOps roles, with 3+ years in a leadership capacity.
- Deep expertise in Kubernetes, Docker, and cloud platforms (AWS, GCP, or Azure).
- Proficient in scripting (Python, Bash) and automation tooling (Terraform, Ansible).
- Strong background in monitoring, alerting, and incident management frameworks.
- Excellent communication skills and a proven ability to influence cross‑functional teams.
Skills
kubernetesdockercicd