remote
Site Reliability Engineer - Multi Cloud Kubernetes
Site Reliability Engineer
Site Reliability Engineer responsible for designing and operating a secure, observable multi‑cloud platform on AWS, Azure and GCP using Kubernetes, infrastructure‑as‑code, and AI‑driven automation.
About the role
Key Responsibilities
- Architect, build, and maintain a scalable Kubernetes platform spanning AWS, Azure, and GCP.
- Implement infrastructure‑as‑code using Terraform and Helm to ensure repeatable, version‑controlled deployments.
- Develop automation scripts and tooling in Python to streamline provisioning, configuration, and incident response.
- Establish observability pipelines with Prometheus, Grafana, and logging solutions for proactive monitoring and alerting.
- Apply zero‑trust security controls, compliance frameworks, and SRE best practices to guarantee reliability and data protection.
Requirements
- 5+ years of experience in SRE or DevOps roles with deep expertise in Kubernetes and container orchestration.
- Hands‑on experience managing workloads across AWS, Azure, and Google Cloud Platform.
- Proficiency with Terraform (or similar IaC tools) and Helm for automated infrastructure delivery.
- Strong scripting/programming skills in Python and familiarity with CI/CD pipelines.
- Experience implementing monitoring, alerting, and zero‑trust security models in production environments.
Skills
kubernetesawsazureterraformpythonprometheushelm