onsite

Site Reliability Engineer, Diagnostics - Tesla

Site Reliability Engineer

Seasoned Site Reliability Engineer needed to architect, scale, and maintain next‑generation diagnostics services for a growing fleet, leveraging containerization, public cloud platforms, and cloud‑native tooling to ensure high availability and rapid capacity planning.

About the role

Key Responsibilities

Plan and analyze capacity for diagnostics services, proactively resizing and migrating infrastructure to meet demand.
Lead infrastructure change management, tuning and reshaping production environments for optimal performance.
Collaborate with software engineers to identify, troubleshoot, and resolve production incidents, ensuring minimal impact.
Design, validate, and exercise failover and disaster recovery plans, implementing graceful degradation policies.
Maintain and improve monitoring, alerting, and logging pipelines for real‑time visibility.

Requirements

5+ years of SRE experience in a fast‑moving, high‑scale environment.
Deep expertise with container orchestration (Kubernetes, Docker) and public cloud platforms (AWS, GCP, Azure).
Strong background in cloud‑native application design, CI/CD, and infrastructure as code.
Proven ability to perform capacity planning, performance tuning, and disaster recovery.
Excellent communication skills and a collaborative mindset.

Skills

pythonbashkubernetesdockerlinuxelectrical engineering

CompanyTesla

DepartmentEngineering

LocationPalo Alto, CA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary312,000

Posted June 19, 2026