remote
Senior Site Reliability Engineer SRE - acquird
Site Reliability Engineer
Lead the reliability and scalability of a fast‑growing B2B SaaS platform on Azure, driving automation, monitoring, and incident response to ensure high availability and performance.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on Azure, including Azure Kubernetes Service (AKS) and related services.
- Develop and manage CI/CD pipelines, ensuring rapid, reliable deployments across multiple environments.
- Implement comprehensive monitoring, alerting, and logging solutions using Azure Monitor, Log Analytics, and third‑party tools.
- Lead incident response, root cause analysis, and post‑mortem processes to continuously improve system reliability.
- Collaborate with development teams to embed SRE best practices into code reviews, architecture decisions, and release processes.
Requirements
- 5+ years of experience in site reliability or cloud engineering roles, with a strong focus on Azure.
- Proficiency in Kubernetes, container orchestration, and cloud-native tooling.
- Hands‑on experience with CI/CD tools (GitHub Actions, Azure DevOps, Jenkins) and scripting (PowerShell, Bash, Python).
- Deep understanding of monitoring, alerting, and incident management practices.
- Excellent communication skills and a collaborative mindset.
Skills
azurekubernetescicd