remote
Sr. Site Reliability Engineer - Versant
Site Reliability Engineer
Senior Site Reliability Engineer driving reliability, scalability, and automation for high‑traffic media services using Kubernetes, Docker, AWS, and Terraform.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure for media streaming and content delivery platforms.
- Lead incident response, root‑cause analysis, and post‑mortem documentation to continuously improve system reliability.
- Automate deployment pipelines with CI/CD tools, Terraform, and GitOps practices to accelerate feature delivery.
- Monitor system health using Prometheus, Grafana, and custom alerts; optimize performance and cost across AWS services.
- Collaborate with development, security, and product teams to embed reliability best practices into the software development lifecycle.
Requirements
- 5+ years of experience in site reliability or DevOps roles within high‑traffic, media‑centric environments.
- Proficiency with Kubernetes, Docker, and container orchestration at scale.
- Hands‑on experience with AWS services (EC2, ECS/EKS, RDS, S3, CloudWatch) and IaC using Terraform.
- Strong scripting skills in Python or Bash and familiarity with CI/CD pipelines (GitHub Actions, Jenkins, ArgoCD).
- Excellent problem‑solving, communication, and collaboration abilities in a fast‑paced, cross‑functional team.
Skills
kubernetesdockerawsterraform