remote
Site Reliability Engineer - NBCUniversal
Site Reliability Engineer
Site Reliability Engineer responsible for building and operating scalable, highly available infrastructure for streaming and media services, leveraging cloud platforms, container orchestration, and automation tools.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, fault‑tolerant services on AWS supporting streaming and media workloads.
- Develop automation and infrastructure‑as‑code using Terraform, Python, and Go to streamline provisioning and configuration.
- Manage containerized workloads with Kubernetes, ensuring performance, scaling, and reliability.
- Build and maintain CI/CD pipelines to enable rapid, safe deployments and continuous delivery.
- Monitor system health, troubleshoot incidents, and lead post‑mortem analyses to drive continuous improvement.
Requirements
- 3+ years of experience in site reliability or DevOps engineering, preferably in media or streaming environments.
- Strong proficiency in Python or Go for scripting and automation.
- Hands‑on experience with Kubernetes, Docker, and container orchestration at scale.
- Deep knowledge of AWS services (EC2, S3, RDS, Lambda, etc.) and networking concepts.
- Experience with Terraform or similar IaC tools and CI/CD platforms (Jenkins, GitLab CI, CircleCI).
Skills
pythongokubernetesawsterraformcicdlinux