onsite
Founding DevOps Engineer SRE - Cygrid GmbH
Site Reliability Engineer
Lead the design and operation of scalable, resilient infrastructure for a fast‑growing startup, driving automation, reliability, and performance using Kubernetes, CI/CD pipelines, and cloud services.
About the role
Key Responsibilities
- Architect, deploy, and maintain production‑grade Kubernetes clusters on AWS, ensuring high availability and scalability.
- Design and implement CI/CD pipelines with GitOps principles, automating code delivery from commit to production.
- Build and manage observability stack (Prometheus, Grafana, Loki) for real‑time monitoring, alerting, and incident response.
- Implement infrastructure as code using Terraform and Helm, enforcing version control and reproducibility.
- Collaborate with development teams to embed SRE practices, such as error budgets, blameless post‑mortems, and capacity planning.
- Lead incident management, root‑cause analysis, and continuous improvement of reliability metrics.
Requirements
- 5+ years of experience in DevOps or SRE roles, with a strong background in cloud-native technologies.
- Proficiency with Kubernetes, Docker, and container orchestration best practices.
- Hands‑on experience with AWS services (EKS, EC2, S3, CloudWatch) and IaC tools (Terraform, Helm).
- Solid scripting skills in Bash, Python, or Go for automation.
- Excellent problem‑solving skills, strong communication, and a proactive, ownership mindset.
Skills
kubernetescicdawsterraform