onsite

Senior Site Reliability Engineer - Stack AV

Site Reliability Engineer

Lead reliability initiatives for AI‑driven autonomous systems, designing scalable infrastructure, automating deployments, and ensuring high availability using Kubernetes, AWS, Terraform, and modern observability tools.

About the role

Key Responsibilities

Design, implement, and operate highly available, scalable infrastructure for AI‑powered autonomous trucking solutions.
Develop and maintain IaC pipelines using Terraform and CI/CD tools to automate provisioning and releases.
Manage Kubernetes clusters on AWS, ensuring performance, security, and cost‑efficiency.
Build robust monitoring, alerting, and incident‑response frameworks with tools such as Prometheus, Grafana, and PagerDuty.
Collaborate with software, data science, and robotics teams to embed reliability best practices into the development lifecycle.

Requirements

5+ years of SRE or DevOps experience in cloud environments, preferably AWS.
Strong proficiency in Kubernetes orchestration and containerization.
Hands‑on experience with Terraform, Python or Go for automation and tooling.
Deep understanding of CI/CD pipelines, monitoring, logging, and incident management.
Track record of improving system reliability, performance, and scalability in complex, real‑time applications.

Skills

kubernetesawsterraformpythongocicd

CompanyStack AV

DepartmentEngineering

LocationPittsburgh, Pennsylvania, United States

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 22, 2026