hybrid

Site Reliability Engineer SRE

Site Reliability Engineer

Site Reliability Engineer to strengthen platform reliability through observability, incident response, and infrastructure management using AWS and Terraform.

About the role

Key Responsibilities

Design and implement full-stack observability by evaluating and improving monitoring and metrics solutions
Lead blameless incident response and post-mortems to enhance system reliability
Mentor engineers in logging, monitoring, and reliability best practices
Define and track KPIs for platform reliability and performance with engineering leadership
Deploy infrastructure updates using Terraform on AWS
Build proofs of concept for logging and metrics across frameworks and languages

Requirements

Bachelor’s degree required
Five years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role
Strong experience with Infrastructure as Code, specifically Terraform
Hands-on experience managing cloud infrastructure in AWS
Knowledge of monitoring, logging, and observability tools

Skills

awsterraformdockerkubernetesobservabilityincident response

DepartmentEngineering

LocationNashville, Tennessee, United States

Experience5+ years

Tenurefull-time

LevelSenior

Posted April 21, 2026