onsite
Tech Lead SRE & Cloud Platform - Bauhaus AG
Site Reliability Engineer
Lead the SRE team to design, build, and operate scalable cloud platforms using Kubernetes, AWS, and Terraform, ensuring high availability, performance, and automation across the organization.
About the role
Key Responsibilities
- Architect and maintain highly available, scalable cloud infrastructure on AWS, leveraging Kubernetes and Terraform for infrastructure-as-code.
- Lead the SRE team in implementing robust monitoring, alerting, and incident response processes to ensure 99.99% uptime.
- Drive continuous improvement of CI/CD pipelines, deployment automation, and release management practices.
- Collaborate with development, security, and product teams to embed reliability and security best practices into the software delivery lifecycle.
- Mentor and coach team members, fostering a culture of ownership, learning, and operational excellence.
Requirements
- 5+ years of experience in Site Reliability Engineering or DevOps roles, with a strong focus on cloud-native technologies.
- Hands‑on expertise with Kubernetes, AWS services (EC2, EKS, RDS, S3), Terraform, and CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
- Proven track record of designing and operating production‑grade systems at scale, including performance tuning and capacity planning.
- Strong scripting skills (Python, Bash) and familiarity with observability tools (Prometheus, Grafana, Loki, ELK).
- Excellent communication, problem‑solving, and leadership abilities in a fast‑paced environment.
Skills
kubernetesawsterraformcicd