onsite
SRE Managing Consultant - Cloud Operating Model - Capgemini
Site Reliability Engineer
Lead cloud‑native reliability initiatives as an SRE Managing Consultant, designing operating models, automating infrastructure, and driving performance and resilience across multi‑cloud environments.
About the role
Key Responsibilities
- Design and implement cloud operating models that embed SRE principles for high‑availability, scalability, and security.
- Lead the automation of infrastructure provisioning and configuration using Terraform and CI/CD pipelines.
- Architect, deploy, and manage containerized workloads on Kubernetes across AWS and Azure platforms.
- Establish monitoring, logging, and incident‑response frameworks to ensure rapid detection and resolution of service disruptions.
- Mentor and guide client teams on best practices for reliability, performance tuning, and cost optimization.
Requirements
- 5+ years of hands‑on experience in Site Reliability Engineering or Cloud Operations.
- Deep expertise with Kubernetes, Terraform, and CI/CD tools (e.g., Jenkins, GitLab CI).
- Strong background in AWS services; experience with Azure is a plus.
- Proven ability to design monitoring and observability solutions using tools such as Prometheus, Grafana, or Datadog.
- Excellent communication skills and experience advising senior stakeholders on cloud strategy.
Skills
kubernetesterraformcicdaws