onsite

Site Reliability Engineer, C2 Systems - Anduril

Site Reliability Engineer

Site Reliability Engineer focused on building and maintaining scalable, resilient infrastructure for AI‑powered defense systems using Kubernetes, Docker, AWS, and observability tools like Prometheus and Grafana.

About the role

Key Responsibilities

Design, deploy, and manage containerized services on Kubernetes clusters to support real‑time command and control workloads.
Implement CI/CD pipelines and infrastructure as code (Terraform) for rapid, reliable releases.
Monitor system health with Prometheus, Grafana, and custom alerts; troubleshoot performance and availability issues.
Collaborate with software, security, and operations teams to enforce best practices and improve system reliability.
Automate routine operational tasks using scripting (Python, Bash) and configuration management.

Requirements

3+ years of SRE or DevOps experience in a high‑availability environment.
Proficiency with Kubernetes, Docker, and cloud platforms (AWS).
Hands‑on experience with monitoring/alerting tools such as Prometheus and Grafana.
Strong scripting skills (Python or Bash) and familiarity with Terraform or similar IaC tools.
Excellent problem‑solving skills and a proactive, collaborative mindset.

Skills

kubernetesdockerawsprometheusgrafanaterraform

CompanyAnduril

DepartmentEngineering

LocationHonolulu, HI, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 20, 2026