remote
Sr Site Reliability Engineer - Renaissance Learning
Site Reliability Engineer
Senior Site Reliability Engineer driving reliability, scalability, and automation for a global education technology platform using Kubernetes, Docker, AWS, and Terraform to ensure high availability and rapid incident response.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS using Terraform and Kubernetes.
- Automate deployment pipelines, configuration management, and monitoring with CI/CD tools and custom scripts.
- Lead incident response, root‑cause analysis, and post‑mortem documentation to improve system resilience.
- Collaborate with development, security, and product teams to embed reliability best practices into the software development lifecycle.
- Continuously evaluate and adopt new technologies to enhance performance, cost efficiency, and observability.
Requirements
- 5+ years of experience in site reliability or DevOps roles, managing production systems at scale.
- Proficiency with Kubernetes, Docker, and container orchestration at enterprise level.
- Strong scripting skills (Python, Bash) and experience with IaC tools such as Terraform.
- Hands‑on experience with AWS services (EC2, EKS, RDS, CloudWatch, S3, IAM).
- Excellent problem‑solving, communication, and collaboration abilities.
Skills
kubernetesdockerawsterraform