remote
Senior SRE Platform Architect - Bitdeer Technologies Group
Site Reliability Engineer
Lead the design and operation of resilient, scalable platform services, driving automation, observability, and incident response across multi‑cloud environments using Kubernetes, AWS, and Terraform.
About the role
Key Responsibilities
- Architect and maintain highly available, scalable platform infrastructure across on‑prem and cloud environments.
- Design and implement CI/CD pipelines, infrastructure as code, and automated deployment workflows.
- Define and enforce observability, monitoring, and alerting strategies to ensure service reliability.
- Lead incident response, post‑mortem analysis, and continuous improvement of reliability practices.
- Collaborate with development, security, and operations teams to embed SRE principles into product lifecycles.
Requirements
- 5+ years of SRE or platform engineering experience in large‑scale distributed systems.
- Proficiency with Kubernetes, AWS, Terraform, and modern CI/CD tools.
- Strong scripting skills (Python, Bash) and experience with monitoring tools (Prometheus, Grafana, ELK).
- Excellent problem‑solving, communication, and collaboration abilities.
- Experience with incident management frameworks and post‑mortem culture.
Skills
kubernetesawsterraformcicd