onsite
Senior SRE Engineer - Ninox Software GmbH
Site Reliability Engineer
Lead the reliability and automation of cloud-native services, driving performance, scalability, and incident response using Kubernetes, Docker, AWS, and modern observability tools.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS using Terraform and Kubernetes.
- Build and manage CI/CD pipelines, ensuring rapid, reliable deployments with GitOps principles.
- Monitor system health with Prometheus, Grafana, and custom alerts; conduct post‑incident reviews and root cause analysis.
- Collaborate with development teams to embed reliability best practices into the software development lifecycle.
- Automate operational tasks with scripting (Python, Bash) and configuration management.
Requirements
- 5+ years of experience in SRE or DevOps roles, with deep knowledge of Kubernetes and container orchestration.
- Proficient in AWS services (EC2, EKS, RDS, CloudWatch) and IaC tools like Terraform.
- Strong scripting skills in Python or Bash and familiarity with CI/CD tools (GitLab CI, Jenkins, ArgoCD).
- Experience with monitoring, logging, and alerting stacks (Prometheus, Grafana, Loki, ELK).
- Excellent problem‑solving skills, ability to work under pressure, and strong communication abilities.
Skills
kubernetesdockerawsterraformprometheusgrafana