remote
Senior Site Reliability Engineer - Mastercard
Site Reliability Engineer
Senior Site Reliability Engineer driving reliability, scalability, and automation for high‑availability services using Kubernetes, Docker, AWS, Terraform, and advanced monitoring tools.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS using Terraform and Kubernetes.
- Automate deployment pipelines and configuration management with CI/CD tools, ensuring rapid, reliable releases.
- Implement robust monitoring, alerting, and incident response processes to maintain service level objectives.
- Collaborate with development teams to embed reliability best practices into application design.
- Lead root‑cause analysis, post‑mortems, and continuous improvement initiatives.
Requirements
- 5+ years of SRE or DevOps experience in a cloud‑native environment.
- Proficiency with Kubernetes, Docker, and AWS services (EKS, EC2, RDS, S3).
- Strong scripting skills (Python, Bash) and experience with Terraform or similar IaC tools.
- Hands‑on experience with monitoring/observability stacks (Prometheus, Grafana, ELK).
- Excellent problem‑solving, communication, and collaboration skills.
Skills
kubernetesdockerawsterraformcicd