onsite
Site Reliability Engineer - Security Clearance - General Atomics Intelligence
Site Reliability Engineer
Senior SRE responsible for designing, deploying, and maintaining high‑availability, secure data‑processing pipelines that ingest petabytes of streaming data in real time, leveraging AWS, Kubernetes, and machine‑learning workflows.
About the role
Key Responsibilities
- Architect and operate scalable, highly available data pipelines on AWS, ensuring 99.99% uptime for mission‑critical intelligence workloads.
- Implement and maintain Kubernetes clusters, Helm charts, and CI/CD pipelines to automate deployments and rollbacks.
- Integrate state‑of‑the‑art machine‑learning models into real‑time streaming pipelines, monitoring performance and drift.
- Lead incident response, root‑cause analysis, and post‑mortem documentation for production outages.
- Collaborate with security teams to enforce hardening, vulnerability management, and compliance with DoD and intelligence community standards.
Requirements
- 5+ years of SRE or DevOps experience in large‑scale, data‑intensive environments.
- Proficiency with AWS services (EKS, S3, Lambda, CloudWatch) and Kubernetes.
- Strong scripting skills in Python and experience with CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
- Hands‑on experience with machine‑learning model deployment and monitoring.
- Active security clearance (or ability to obtain) and familiarity with DoD security frameworks.
Skills
pythonawskubernetesmachine learningcicd