onsite

Site Reliability Engineer - Stack AV

Site Reliability Engineer

We seek a Site Reliability Engineer to design, implement, and maintain highly available cloud infrastructure for AI‑driven autonomous trucking solutions, leveraging Kubernetes, AWS, and automation tools.

About the role

Key Responsibilities

Design, deploy, and operate scalable Kubernetes clusters on AWS to support AI and robotics workloads.
Develop and maintain infrastructure‑as‑code using Terraform and related automation frameworks.
Implement robust CI/CD pipelines for continuous delivery of services and updates.
Monitor system performance, troubleshoot incidents, and drive root‑cause analysis to improve reliability.
Collaborate with software, data science, and hardware teams to ensure seamless integration of autonomous system components.

Requirements

3+ years of experience in site reliability or DevOps roles, preferably in cloud‑native environments.
Proficiency in programming/scripting with Python and Go.
Strong hands‑on experience with Kubernetes, Docker, and AWS services (EKS, EC2, S3, etc.).
Expertise in infrastructure‑as‑code tools such as Terraform and configuration management.
Solid understanding of Linux systems, networking, and monitoring tools (Prometheus, Grafana, ELK).

Skills

pythongokubernetesawsterraformcicdlinux

CompanyStack AV

DepartmentEngineering

LocationPittsburgh, Pennsylvania, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 22, 2026