remote

Site Reliability Engineer, Kubernetes Platform Starshield - SpaceX

Site Reliability Engineer

Lead the reliability and scalability of Starshield’s Kubernetes platform, ensuring high availability, automated deployment, and robust monitoring across a global satellite constellation using AWS/GCP, CI/CD pipelines, and advanced observability tools.

About the role

Key Responsibilities

Design, implement, and maintain a highly available Kubernetes platform that supports the Starshield satellite constellation’s mission-critical workloads.
Develop and manage CI/CD pipelines for automated build, test, and deployment of containerized services across multiple cloud environments.
Implement comprehensive monitoring, logging, and alerting solutions to detect, diagnose, and remediate incidents with minimal downtime.
Collaborate with cross‑functional teams to define and enforce best practices for infrastructure as code, security, and compliance.
Lead capacity planning, performance tuning, and cost optimization initiatives for large‑scale distributed systems.

Requirements

5+ years of experience in Site Reliability Engineering or DevOps roles, with a strong focus on Kubernetes.
Proficiency in cloud platforms (AWS or GCP) and experience with infrastructure as code tools such as Terraform or CloudFormation.
Hands‑on expertise with CI/CD tools (Jenkins, GitHub Actions, ArgoCD) and container orchestration best practices.
Deep knowledge of monitoring and observability stacks (Prometheus, Grafana, ELK/EFK, or similar).
Strong scripting skills (Python, Bash) and a solid understanding of networking, security, and compliance requirements for mission‑critical systems.

Skills

kubernetescicd

CompanySpaceX

DepartmentEngineering

LocationHawthorne, CA, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary175,000

Posted June 19, 2026