remote
Site Reliability Engineer - moneybird
Site Reliability Engineer
Site Reliability Engineer responsible for designing, deploying, and maintaining highly available cloud infrastructure using AWS, Kubernetes, and Terraform, while ensuring performance, reliability, and continuous improvement through automation and observability tools.
About the role
Key Responsibilities
- Design, implement, and manage scalable, highly available infrastructure on AWS using Terraform and Kubernetes.
- Automate deployment pipelines with CI/CD tools, ensuring rapid, reliable releases.
- Monitor system health with Prometheus, Grafana, and custom alerts; troubleshoot incidents and conduct post‑mortems.
- Collaborate with development teams to embed reliability best practices into code and architecture.
- Implement security, compliance, and cost‑optimization strategies across the stack.
Requirements
- 3+ years of experience in site reliability or DevOps roles.
- Proficiency in Python scripting and automation.
- Hands‑on experience with Kubernetes, Docker, and AWS services (EC2, RDS, S3, EKS).
- Strong knowledge of Terraform, CI/CD pipelines, and monitoring/alerting tools.
- Excellent problem‑solving skills and a proactive, collaborative mindset.
Skills
pythonkubernetesawsdockerterraformprometheusgrafanacicd