onsite

Sr Manager, Site Reliability Engineering - FIS

Software Engineer

Lead a high‑performing SRE team to build and maintain a resilient, scalable payments platform using AWS, Kubernetes, and advanced monitoring, driving proactive reliability and performance improvements.

About the role

Key Responsibilities

Lead and mentor a team of SRE engineers to design, implement, and operate highly available payment processing services on AWS.
Architect and maintain Kubernetes clusters, CI/CD pipelines, and infrastructure-as-code for rapid, reliable deployments.
Define and enforce reliability SLAs, run post‑incident reviews, and drive continuous improvement of incident response processes.
Collaborate with development, security, and product teams to embed reliability best practices into the software development lifecycle.
Implement and evolve monitoring, alerting, and observability solutions (Prometheus, Grafana, etc.) to detect and remediate performance bottlenecks.

Requirements

10+ years of experience in large‑scale distributed systems, with at least 5 years in a senior SRE or DevOps leadership role.
Deep expertise in AWS services (EC2, RDS, ECS/EKS, CloudWatch) and Kubernetes cluster management.
Proven track record building CI/CD pipelines (GitHub Actions, Jenkins, ArgoCD) and automating infrastructure with Terraform or CloudFormation.
Strong background in monitoring, alerting, and incident management using Prometheus, Grafana, and PagerDuty.
Excellent communication skills and a collaborative mindset to work across engineering, product, and operations teams.

Skills

awskubernetescicd

CompanyFIS

DepartmentEngineering

LocationTamil Nadu, India

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 20, 2026