remote
Junior Site Reliability Engineer Azure Platform - UST
Site Reliability Engineer
Join a cloud operations team as a Junior Site Reliability Engineer, focusing on Azure‑based infrastructure, Kubernetes orchestration, and automated monitoring to ensure high availability and performance of cloud‑native services.
About the role
Key Responsibilities
- Monitor and maintain the health, performance, and availability of Azure‑hosted services and Kubernetes clusters.
- Develop and refine automation scripts (Python, Bash) to streamline routine operational tasks and incident response.
- Implement infrastructure‑as‑code using Terraform to provision and manage cloud resources consistently.
- Configure and tune monitoring solutions such as Prometheus and Azure Monitor, creating alerts and dashboards for proactive issue detection.
- Collaborate with senior SREs, developers, and cloud engineers to troubleshoot incidents, perform root‑cause analysis, and drive continuous improvement.
Requirements
- Fundamental understanding of Microsoft Azure services (VMs, networking, storage, IAM).
- Hands‑on experience with Kubernetes concepts (pods, deployments, services) in a cloud environment.
- Proficiency in scripting languages, preferably Python or Bash, for automation and tooling.
- Familiarity with infrastructure‑as‑code tools, especially Terraform.
- Experience with monitoring and alerting platforms such as Prometheus, Grafana, or Azure Monitor.
Skills
kubernetesterraformpythonprometheus