onsite
Senior Site Reliability Engineer - airasia
Site Reliability Engineer
Senior Site Reliability Engineer responsible for designing, implementing, and operating highly available cloud infrastructure, automating deployments, and ensuring performance and reliability of critical services using Kubernetes, Terraform, and AWS.
About the role
Key Responsibilities
- Design, build, and maintain scalable, highly‑available infrastructure on AWS using IaC tools such as Terraform.
- Develop and operate container orchestration platforms (Kubernetes) to support micro‑service architectures.
- Implement robust monitoring, alerting, and observability solutions with Prometheus, Grafana, and related tooling.
- Automate CI/CD pipelines, release processes, and incident response workflows using Python and industry‑standard tools.
- Collaborate with development and product teams to improve reliability, performance, and cost efficiency of services.
Requirements
- 5+ years of experience in site reliability or DevOps engineering, with deep expertise in Linux systems.
- Proven hands‑on experience with Kubernetes, Docker, and cloud platforms (AWS preferred).
- Strong proficiency in infrastructure as code (Terraform, CloudFormation) and scripting (Python, Bash).
- Experience building monitoring, logging, and alerting pipelines (Prometheus, Grafana, ELK).
- Solid understanding of networking, security, and CI/CD concepts, with a track record of automating production workflows.
Skills
linuxkubernetesterraformpythonawsprometheuscicd