onsite

Senior Site Reliability Engineer - Heartbeat AI GmbH

Site Reliability Engineer

Lead the design, deployment, and maintenance of highly available, scalable platform services using Kubernetes, Docker, and cloud-native observability tools. Drive automation, reliability, and performance across the entire infrastructure stack.

About the role

Key Responsibilities

Architect, implement, and operate production-grade Kubernetes clusters and containerized services.
Design and maintain CI/CD pipelines, ensuring rapid, reliable releases.
Implement monitoring, alerting, and logging with Prometheus, Grafana, and ELK stack.
Automate infrastructure provisioning and configuration using Terraform and IaC best practices.
Collaborate with development teams to optimize application performance and resilience.
Respond to incidents, conduct post‑mortems, and drive continuous improvement.

Requirements

5+ years of experience in site reliability or DevOps roles.
Deep knowledge of Kubernetes, Docker, and cloud platforms (AWS preferred).
Proficiency with monitoring, logging, and alerting tools (Prometheus, Grafana, ELK).
Strong scripting skills (Python, Bash) and experience with CI/CD tools (GitHub Actions, Jenkins).
Hands‑on experience with Terraform or similar IaC tools.

Skills

kubernetesdockerprometheusgrafanacicdawsterraform

CompanyHeartbeat AI GmbH

DepartmentEngineering

LocationHamburg, Germany

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 21, 2026