onsite

Lead Site Reliability Engineer SRE - AWS/Linux/Windows - Cognizant

Site Reliability Engineer

Lead a compute reliability team, driving operational excellence across Linux, Windows, and AWS environments using SRE principles, automation, and performance tuning.

About the role

Key Responsibilities

Lead and mentor a cross‑functional compute reliability team covering Linux/Unix, Windows, and AWS platforms.
Design, implement, and maintain automated monitoring, alerting, and incident‑response workflows to reduce mean time to recovery.
Apply Site Reliability Engineering practices to improve system availability, performance, and scalability.
Collaborate with development and infrastructure teams to define service level objectives (SLOs) and service level indicators (SLIs).
Drive continuous improvement by identifying operational toil and implementing automation, scripting, and infrastructure‑as‑code solutions.

Requirements

10+ years of experience in systems administration or reliability engineering, with deep expertise in Linux (preferred) and solid working knowledge of Windows.
Extensive hands‑on experience managing production workloads on AWS, including EC2, S3, RDS, and networking services.
Proven track record of implementing SRE methodologies, incident management, and performance tuning at scale.
Strong scripting/automation skills (e.g., Python, Bash, PowerShell) and familiarity with infrastructure‑as‑code tools.
Excellent communication and leadership abilities to guide teams and influence stakeholders.

Skills

awslinux

CompanyCognizant

DepartmentEngineering

LocationHartford, Connecticut, United States

Experience7+ years

Tenurefull-time

LevelLead

Salary130,000

Posted June 24, 2026