onsite
Site Reliability Engineer SRE - Core Platform Services - SAP
Site Reliability Engineer
Join a leading cloud platform team as an SRE, driving reliability, automation, and scalability for a sovereign, secure public‑sector cloud using Kubernetes, AWS, Terraform, and CI/CD pipelines.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure for a sovereign cloud platform on AWS.
- Automate deployment pipelines using Terraform, GitOps, and CI/CD tools to ensure rapid, reliable releases.
- Monitor system health, troubleshoot incidents, and implement proactive alerting and capacity planning.
- Collaborate with development teams to embed SRE best practices into application design and deployment.
- Document processes, runbooks, and post‑mortems to continuously improve reliability and resilience.
Requirements
- Proven experience as an SRE or DevOps engineer in a cloud environment.
- Strong hands‑on skills with Kubernetes, AWS services, and Terraform.
- Experience with CI/CD pipelines, monitoring (Prometheus, Grafana, or similar), and incident response.
- Solid scripting knowledge in Python or Bash for automation.
- Excellent problem‑solving skills and a collaborative mindset.
Skills
kubernetesawsterraformcicdpython