onsite
Senior Site Reliability Engineer - Support - FIS
Site Reliability Engineer
Senior Site Reliability Engineer – Support responsible for diagnosing, troubleshooting, and resolving complex production issues on an AI‑enabled autonomous banking platform, leveraging Kubernetes, Docker, AWS, Python, and advanced monitoring tools to ensure high availability and customer satisfaction.
About the role
Key Responsibilities
- Diagnose, troubleshoot, and resolve complex production and customer issues across the autonomous banking platform.
- Maintain and improve Kubernetes and Docker deployments, ensuring scalability and reliability.
- Implement and manage monitoring, alerting, and incident response workflows using AWS CloudWatch, Prometheus, and Grafana.
- Collaborate with development, security, and product teams to root‑cause incidents and drive permanent fixes.
- Document troubleshooting procedures, runbooks, and post‑incident analyses for continuous improvement.
Requirements
- 5+ years of experience in site reliability engineering or a related field.
- Proficient with Kubernetes, Docker, and AWS services (EC2, EKS, S3, CloudWatch).
- Strong scripting skills in Python and experience with CI/CD pipelines.
- Hands‑on experience with monitoring, alerting, and incident management tools.
- Excellent communication skills and a customer‑focused mindset.
Skills
kubernetesdockerawspython