hybrid
Site Reliability Engineer SRE
Site Reliability Engineer
Site Reliability Engineer to strengthen platform reliability through observability, incident response, and infrastructure management using AWS and Terraform.
About the role
Key Responsibilities
- Design and implement full-stack observability by evaluating and improving monitoring and metrics solutions
- Lead blameless incident response and post-mortems to enhance system reliability
- Mentor engineers in logging, monitoring, and reliability best practices
- Define and track KPIs for platform reliability and performance with engineering leadership
- Deploy infrastructure updates using Terraform on AWS
- Build proofs of concept for logging and metrics across frameworks and languages
Requirements
- Bachelor’s degree required
- Five years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role
- Strong experience with Infrastructure as Code, specifically Terraform
- Hands-on experience managing cloud infrastructure in AWS
- Knowledge of monitoring, logging, and observability tools
Skills
awsterraformdockerkubernetesobservabilityincident response