remote
Senior Site Reliability Engineer - Royal Caribbean Group
Site Reliability Engineer
Lead the design, implementation, and maintenance of highly available, scalable infrastructure for a global cruise line, leveraging Kubernetes, Docker, AWS, and Terraform to ensure reliability, performance, and rapid deployment.
About the role
Key Responsibilities
- Architect and maintain production-grade Kubernetes clusters, ensuring high availability and efficient resource utilization across multiple regions.
- Design and automate infrastructure as code using Terraform, integrating with AWS services to provision scalable, secure environments.
- Implement and manage CI/CD pipelines, container image builds, and deployment strategies to accelerate feature delivery while maintaining stability.
- Monitor system health with Prometheus and Grafana, proactively identifying and resolving performance bottlenecks and incidents.
- Collaborate with development, security, and product teams to define SLOs, SLIs, and incident response procedures.
Requirements
- 5+ years of experience in site reliability or DevOps roles within large-scale, distributed systems.
- Proficient with Kubernetes, Docker, and AWS (EC2, EKS, S3, RDS).
- Hands‑on experience writing Terraform modules and managing IaC pipelines.
- Strong scripting skills in Bash or Python for automation and tooling.
- Excellent problem‑solving abilities and a proactive, collaborative mindset.
Skills
kubernetesdockerawsterraform