remote

Site Reliability Engineer - NBCUniversal

Site Reliability Engineer

Site Reliability Engineer responsible for building and operating scalable, highly available infrastructure for streaming and media services, leveraging cloud platforms, container orchestration, and automation tools.

About the role

Key Responsibilities

Design, implement, and maintain highly available, fault‑tolerant services on AWS supporting streaming and media workloads.
Develop automation and infrastructure‑as‑code using Terraform, Python, and Go to streamline provisioning and configuration.
Manage containerized workloads with Kubernetes, ensuring performance, scaling, and reliability.
Build and maintain CI/CD pipelines to enable rapid, safe deployments and continuous delivery.
Monitor system health, troubleshoot incidents, and lead post‑mortem analyses to drive continuous improvement.

Requirements

3+ years of experience in site reliability or DevOps engineering, preferably in media or streaming environments.
Strong proficiency in Python or Go for scripting and automation.
Hands‑on experience with Kubernetes, Docker, and container orchestration at scale.
Deep knowledge of AWS services (EC2, S3, RDS, Lambda, etc.) and networking concepts.
Experience with Terraform or similar IaC tools and CI/CD platforms (Jenkins, GitLab CI, CircleCI).

Skills

pythongokubernetesawsterraformcicdlinux

CompanyNBCUniversal

DepartmentEngineering

LocationCentennial, Colorado, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 26, 2026