remote
Site Reliability Engineer II - Mastercard
Site Reliability Engineer
Senior Site Reliability Engineer focused on designing, deploying, and maintaining highly available, scalable infrastructure on AWS using Kubernetes, Docker, Terraform, and monitoring tools to ensure seamless digital payment services.
About the role
Key Responsibilities
- Design, implement, and manage scalable, highly available infrastructure on AWS using Kubernetes and Docker containers.
- Automate deployment pipelines with Terraform, CI/CD tools, and GitOps practices.
- Monitor system health and performance using Prometheus, Grafana, and custom alerting.
- Collaborate with development teams to optimize application performance and reliability.
- Respond to incidents, conduct root cause analysis, and implement preventive measures.
Requirements
- 3+ years of SRE or DevOps experience in a cloud environment.
- Proficiency with Kubernetes, Docker, and container orchestration.
- Hands‑on experience with AWS services (EKS, EC2, S3, CloudWatch).
- Strong scripting skills in Bash or Python and infrastructure-as-code with Terraform.
- Excellent problem‑solving skills and a proactive approach to reliability.
Skills
kubernetesdockerawsterraform