onsite

Sr. Site Reliability Engineer Starshield - SpaceX

Site Reliability Engineer

Senior Site Reliability Engineer driving the reliability, scalability, and security of Starshield’s satellite‑based services using Kubernetes, Terraform, and AWS, while ensuring robust monitoring, incident response, and continuous delivery pipelines for mission‑critical government applications.

About the role

Key Responsibilities

Design, implement, and maintain highly available, scalable infrastructure for Starshield’s satellite services on AWS and Kubernetes clusters.
Develop and manage Terraform modules and CI/CD pipelines to automate provisioning, configuration, and deployment of services.
Implement comprehensive monitoring, alerting, and logging solutions (Prometheus, Grafana, ELK) to ensure 99.99% uptime and rapid incident resolution.
Collaborate with security, networking, and product teams to enforce best practices, perform threat modeling, and harden infrastructure against cyber threats.
Lead post‑mortem analyses, root cause investigations, and continuous improvement initiatives to reduce MTTR and prevent recurrence.

Requirements

5+ years of SRE or DevOps experience in a high‑scale, mission‑critical environment.
Proficiency with Kubernetes, Terraform, AWS (EC2, RDS, S3, VPC), and Linux system administration.
Strong scripting skills in Python or Bash and experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
Hands‑on experience with monitoring/alerting stacks (Prometheus, Grafana, Loki, ELK) and incident management tools.
Excellent communication, problem‑solving, and collaboration skills in a fast‑paced, cross‑functional team.

Skills

kubernetesterraformawslinuxcicd

CompanySpaceX

DepartmentEngineering

LocationRedmond, WA, United States

Experience5+ years

Tenurefull-time

LevelSenior

Salary230,000

Posted June 19, 2026