onsite

Engineer 3, Site Reliability Engineer - Comcast

Site Reliability Engineer

Experienced Site Reliability Engineer responsible for designing, automating, and operating scalable cloud infrastructure, ensuring high availability and performance of critical media and technology services.

About the role

Key Responsibilities

Design, build, and maintain highly available, fault‑tolerant services on AWS using infrastructure‑as‑code tools such as Terraform.
Develop and support container orchestration platforms (Kubernetes) and automate deployment pipelines with CI/CD frameworks.
Implement monitoring, alerting, and observability solutions (Prometheus, Grafana) to proactively detect and resolve incidents.
Collaborate with development and product teams to improve reliability, performance, and scalability of applications.
Lead incident response, perform root‑cause analysis, and drive post‑mortem improvements.

Requirements

5+ years of experience in site reliability or production engineering roles.
Strong proficiency with Linux systems, scripting (Python or Bash), and cloud platforms (AWS).
Hands‑on experience with Kubernetes, Terraform, and CI/CD tools (Jenkins, GitLab CI, or similar).
Deep understanding of monitoring, logging, and alerting frameworks (Prometheus, Grafana, ELK).
Proven track record of incident management, troubleshooting complex distributed systems, and driving automation.

Skills

linuxkubernetesterraformpythonawscicdprometheus

CompanyComcast

DepartmentEngineering

LocationTamil Nadu, India

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 25, 2026