onsite
Senior Site Reliability Engineer - Forto Logistics SE & Co. KG
Site Reliability Engineer
Lead the design, deployment, and operation of highly available, scalable cloud services using Kubernetes, Docker, and AWS. Drive automation, monitoring, and incident response to ensure optimal performance and reliability.
About the role
Key Responsibilities
- Architect and maintain production-grade Kubernetes clusters, ensuring high availability and efficient resource utilization.
- Implement CI/CD pipelines with GitOps principles, automating deployments and rollbacks across multiple environments.
- Design and maintain observability stack using Prometheus, Grafana, and Loki, creating dashboards and alerting rules that preempt outages.
- Collaborate with development teams to enforce best practices in code quality, security, and infrastructure as code.
- Lead incident investigations, root‑cause analysis, and post‑mortem documentation to continuously improve reliability.
Requirements
- 5+ years of experience in site reliability or DevOps roles, with deep knowledge of Kubernetes and container orchestration.
- Proficient in AWS services (EKS, EC2, S3, CloudWatch) and infrastructure automation tools (Terraform, Helm).
- Strong scripting skills in Python or Bash, and familiarity with CI/CD tools such as GitLab CI, Jenkins, or ArgoCD.
- Hands‑on experience with monitoring, alerting, and log aggregation tools (Prometheus, Grafana, Loki, ELK).
- Excellent problem‑solving abilities, communication skills, and a proactive mindset for continuous improvement.
Skills
kubernetesdockerprometheusgrafanaawscicdpython