remote
Site Reliability Engineer - MPB
Site Reliability Engineer
Site Reliability Engineer focused on automating delivery pipelines, building self‑healing systems, and driving cloud‑native reliability using Kubernetes, CI/CD, and advanced monitoring tools.
About the role
Key Responsibilities
- Design, implement, and maintain CI/CD pipelines that accelerate feature delivery while ensuring reliability.
- Automate infrastructure provisioning and configuration using IaC tools to reduce manual effort.
- Build and manage monitoring, alerting, and incident response workflows for a self‑healing production environment.
- Collaborate with development teams to embed reliability best practices into the software development lifecycle.
- Analyze system performance, troubleshoot incidents, and conduct post‑mortems to drive continuous improvement.
Requirements
- Proven experience in DevOps and Cloud Engineering, with hands‑on work in Kubernetes and container orchestration.
- Strong scripting skills (Python, Bash) and familiarity with IaC tools such as Terraform or CloudFormation.
- Deep understanding of monitoring and observability platforms (Prometheus, Grafana, ELK, or similar).
- Experience with incident management, root cause analysis, and post‑mortem processes.
- Excellent communication skills and a collaborative mindset.