remote
Senior Site Reliability Engineer Azure Platform - UST
Site Reliability Engineer
Senior Site Reliability Engineer focused on designing, building, and operating highly available, scalable Azure-based systems using AKS, CI/CD pipelines, IaC, and observability tools to drive reliability and reduce operational toil.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable Azure infrastructure, including Azure Kubernetes Service (AKS) clusters and supporting services.
- Build and manage CI/CD pipelines that automate application delivery, configuration, and infrastructure changes.
- Apply Infrastructure as Code (IaC) practices using tools such as Terraform or ARM templates to ensure repeatable, versioned deployments.
- Implement observability solutions—metrics, logs, and traces—to monitor system health, detect incidents, and drive continuous improvement.
- Collaborate with platform, application, and security teams to enforce SRE best practices, reduce toil, and improve reliability.
Requirements
- 7–13 years of IT experience with a strong focus on Cloud, DevOps, and Site Reliability Engineering.
- Deep expertise in Microsoft Azure services, especially Azure Kubernetes Service (AKS).
- Proven experience building CI/CD pipelines and managing IaC for large-scale deployments.
- Strong knowledge of observability tools (e.g., Prometheus, Grafana, Azure Monitor) and incident response processes.
- Excellent communication skills and ability to work cross-functionally in a fast-paced environment.