onsite

Lead Site Reliability Engineer - Relativity

Site Reliability Engineer

Lead the reliability and performance of RelativityOne, driving SRE best practices across services, monitoring, incident response, and automation using Kubernetes, Docker, AWS, and observability tools.

About the role

Key Responsibilities

Own end‑to‑end reliability of core platform services, ensuring high availability and fault tolerance.
Design, implement, and maintain scalable monitoring, alerting, and incident response workflows with Prometheus, Grafana, and PagerDuty.
Lead automation of deployment pipelines and configuration management using CI/CD tools and infrastructure as code.
Collaborate with development teams to embed SRE principles into feature design and code reviews.
Drive capacity planning, performance tuning, and cost optimization across AWS infrastructure.

Requirements

5+ years of SRE or DevOps experience in a large, distributed system environment.
Proficiency with Kubernetes, Docker, and cloud platforms (AWS preferred).
Strong scripting skills (Python, Bash) and experience with CI/CD pipelines.
Excellent incident management, root cause analysis, and post‑mortem practices.
Effective communication and mentorship abilities for cross‑functional teams.

Skills

kubernetesdockerawsprometheusgrafanacicd

CompanyRelativity

DepartmentEngineering

LocationIllinois, United States

Experience7+ years

Tenurefull-time

LevelLead

Salary224,000

Posted June 23, 2026