remoteonsite
Site Reliability Engineer SRE - Zversal Pvt Ltd
Site Reliability Engineer
Mid‑level Site Reliability Engineer responsible for incident response, reliability improvements, and toil reduction on a fast‑moving fintech platform using Kubernetes, Docker, AWS, Python and Terraform.
About the role
Key Responsibilities
- Own on‑call duties and independently resolve incidents across the fintech infrastructure.
- Collaborate with global engineering teams to design and implement reliability enhancements.
- Automate operational tasks using Python scripts, Terraform, and CI/CD pipelines.
- Maintain and optimize Kubernetes clusters, Docker containers, and AWS services.
- Implement monitoring, alerting, and logging solutions to reduce toil and improve system observability.
Requirements
- 4–6 years of SRE or DevOps experience in a high‑scale environment.
- Hands‑on experience with monitoring/alerting platforms (Prometheus, Grafana, Datadog).
- Excellent communication skills and ability to work across distributed teams.
Skills
kubernetesdockerawspythonterraform