remote

Senior Site Reliability Engineer - Royal Caribbean Group

Site Reliability Engineer

Lead the design, implementation, and maintenance of highly available, scalable infrastructure for a global cruise line, leveraging Kubernetes, Docker, AWS, and Terraform to ensure reliability, performance, and rapid deployment.

About the role

Key Responsibilities

Architect and maintain production-grade Kubernetes clusters, ensuring high availability and efficient resource utilization across multiple regions.
Design and automate infrastructure as code using Terraform, integrating with AWS services to provision scalable, secure environments.
Implement and manage CI/CD pipelines, container image builds, and deployment strategies to accelerate feature delivery while maintaining stability.
Monitor system health with Prometheus and Grafana, proactively identifying and resolving performance bottlenecks and incidents.
Collaborate with development, security, and product teams to define SLOs, SLIs, and incident response procedures.

Requirements

5+ years of experience in site reliability or DevOps roles within large-scale, distributed systems.
Proficient with Kubernetes, Docker, and AWS (EC2, EKS, S3, RDS).
Hands‑on experience writing Terraform modules and managing IaC pipelines.
Strong scripting skills in Bash or Python for automation and tooling.
Excellent problem‑solving abilities and a proactive, collaborative mindset.

Skills

kubernetesdockerawsterraform

CompanyRoyal Caribbean Group

DepartmentEngineering

LocationMiramar, Florida, United States

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 21, 2026