onsite

Senior Expert Site Reliability Engineer - Vodafone GmbH

Site Reliability Engineer

Lead the design, implementation, and maintenance of highly available, scalable infrastructure using Kubernetes, Docker, and AWS. Drive automation, monitoring, and incident response to ensure optimal system performance and reliability.

About the role

Key Responsibilities

Architect and maintain production-grade Kubernetes clusters, ensuring high availability and scalability across multiple regions.
Design and implement CI/CD pipelines with GitOps principles, automating deployments and rollbacks.
Develop and maintain monitoring dashboards using Prometheus and Grafana, and set up alerting for critical incidents.
Collaborate with development teams to embed reliability best practices into application design.
Lead incident investigations, root cause analysis, and post‑mortem documentation to continuously improve system resilience.

Requirements

5+ years of experience in Site Reliability Engineering or DevOps roles.
Deep expertise with Kubernetes, Docker, and cloud platforms (AWS preferred).
Strong scripting skills in Python and experience with IaC tools (Terraform, CloudFormation).
Proven track record of building automated monitoring, alerting, and incident response workflows.
Excellent communication skills and ability to mentor junior engineers.

Skills

kubernetesdockerawsprometheusgrafanapython

CompanyVodafone GmbH

DepartmentEngineering

LocationDüsseldorf, Germany

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 22, 2026