remote

Agentic Reliability Engineer - SAP

Software Engineer

Drive end‑to‑end reliability for cloud‑native services, automating incident response, monitoring, and capacity planning using Kubernetes, AWS, Python, and Terraform to ensure high availability and performance.

About the role

Key Responsibilities

Design, implement, and maintain observability, alerting, and incident response workflows for multi‑region cloud services.
Automate deployment, scaling, and configuration of Kubernetes clusters and associated infrastructure using Terraform and CI/CD pipelines.
Collaborate with development teams to embed reliability best practices into the software development lifecycle.
Analyze post‑incident reports, root cause analyses, and implement preventive measures to reduce MTTR.
Participate in on‑call rotations, providing rapid response to production incidents and coordinating cross‑functional resolution.

Requirements

3+ years of experience in Site Reliability Engineering or DevOps roles.
Proficiency with Kubernetes, AWS services (EKS, CloudWatch, Lambda), and infrastructure as code (Terraform).
Strong scripting skills in Python or Bash for automation and tooling.
Experience with monitoring/alerting platforms such as Prometheus, Grafana, or Datadog.
Excellent problem‑solving skills and a proactive, collaborative mindset.

Skills

kubernetesawspythonterraform

CompanySAP

DepartmentEngineering

LocationBudapest, Hungary

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 21, 2026