remote
Software Developer II - Site Reliability Engineering - Warner Bros. Discovery
Software Engineer
Senior SRE developer building and maintaining highly available, scalable infrastructure for media streaming services using Python, Go, Kubernetes, and AWS. Focus on automation, observability, and continuous delivery.
About the role
Key Responsibilities
- Design, implement, and maintain production‑grade infrastructure for media delivery pipelines using Kubernetes, Docker, and AWS services.
- Develop automation scripts in Python and Go to streamline deployment, scaling, and configuration management.
- Implement observability solutions with Prometheus, Grafana, and custom metrics to ensure high availability and performance.
- Collaborate with DevOps, security, and application teams to enforce best practices and improve reliability.
- Participate in on‑call rotations, incident response, and post‑mortem analysis to drive continuous improvement.
Requirements
- 3+ years of experience in site reliability engineering or related roles.
- Proficiency in Python and Go for scripting and microservice development.
- Hands‑on experience with Kubernetes, Docker, and AWS (EC2, EKS, S3, CloudWatch).
- Strong knowledge of infrastructure as code using Terraform or CloudFormation.
- Experience with monitoring, alerting, and log aggregation tools such as Prometheus, Grafana, and ELK stack.
Skills
pythongokubernetesdockerawsterraformprometheusgrafana