remote
Software Engineer II, Site Reliability Engineering - Warner Bros. Discovery
Software Engineer
Mid‑level Site Reliability Engineer focused on building and operating scalable cloud infrastructure, automating deployments, and ensuring high availability of critical services using Python, Go, Kubernetes, and AWS.
About the role
Key Responsibilities
- Design, implement, and maintain highly available services on AWS, leveraging Kubernetes, Terraform, and CI/CD pipelines.
- Develop automation scripts and tools in Python and Go to streamline operations, incident response, and capacity planning.
- Monitor system health, set up alerting, and perform root‑cause analysis to improve reliability and performance.
- Collaborate with development and product teams to embed reliability best practices into the software development lifecycle.
- Participate in on‑call rotations, incident triage, and post‑mortem reviews to drive continuous improvement.
Requirements
- 2–4 years of experience in site reliability or DevOps engineering.
- Strong proficiency in Python or Go for automation and tooling.
- Hands‑on experience with Linux systems, Kubernetes orchestration, and AWS cloud services.
- Familiarity with infrastructure‑as‑code tools such as Terraform and CI/CD platforms (e.g., Jenkins, GitHub Actions).
- Experience with monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, ELK).
Skills
pythongolinuxkubernetesawsterraformcicd