onsite

Senior Linux Site Reliability Engineer - SpaceX

Site Reliability Engineer

Lead the design, scaling, and optimization of Kubernetes clusters on Linux, ensuring high availability and performance for critical business services.

About the role

Key Responsibilities

Architect, deploy, and maintain Kubernetes clusters across production environments, ensuring reliability and scalability.
Collaborate with development teams to integrate CI/CD pipelines and automate deployment workflows.
Implement monitoring, alerting, and logging solutions to proactively detect and resolve incidents.
Optimize resource utilization, cost, and performance through capacity planning and tuning.
Lead incident response, root‑cause analysis, and post‑mortem documentation.

Requirements

5+ years of experience in Linux system administration and site reliability engineering.
Deep expertise in Kubernetes, container runtimes, and related ecosystem tools.
Proficiency with automation tools (Ansible, Terraform, Helm) and scripting (Bash, Python).
Strong knowledge of monitoring/observability stacks (Prometheus, Grafana, ELK).
Excellent problem‑solving skills and ability to work in a fast‑paced, mission‑critical environment.

Skills

kuberneteslinux

CompanySpaceX

DepartmentEngineering

LocationBastrop, Texas, United States

Experience5+ years

Tenurefull-time

LevelSenior

Posted June 26, 2026