onsite
Sr. Site Reliability Engineer - Visa
Site Reliability Engineer
Senior Site Reliability Engineer driving reliability, scalability, and automation for Visa’s global payment platform using Kubernetes, Docker, CI/CD pipelines, AWS, and Python, while ensuring high availability and performance through advanced monitoring and incident response.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure for Visa’s payment services on Kubernetes and AWS.
- Develop and manage CI/CD pipelines, automating deployments, rollbacks, and blue‑green strategies.
- Implement observability solutions with Prometheus, Grafana, and custom alerting to detect and resolve incidents proactively.
- Collaborate with development teams to embed reliability best practices into the software development lifecycle.
- Lead post‑mortem analyses, root cause investigations, and continuous improvement initiatives.
Requirements
- 5+ years of SRE or DevOps experience in a large, distributed environment.
- Proficiency with Kubernetes, Docker, and cloud platforms (AWS preferred).
- Strong scripting skills in Python and Bash for automation.
- Hands‑on experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
- Deep understanding of monitoring, alerting, and incident response practices.
Skills
kubernetesdockercicdawspythonprometheus