onsite
AI Operations & Infrastructure Engineer - Invictus
Devops Engineer
Lead the design, deployment, and maintenance of high‑performance AI platforms, managing GPU hardware, container orchestration, and secure networking to deliver reliable AI workloads for classified missions.
About the role
Key Responsibilities
- Design, deploy, and maintain AI computing platforms, including GPU clusters and specialized hardware.
- Install, configure, and update GPU drivers, libraries, and AI software stacks.
- Implement containerization with Docker and orchestrate workloads using Kubernetes.
- Configure and optimize high‑speed networking (InfiniBand, Ethernet) for AI data pipelines.
- Ensure compliance with security policies and maintain TS/SCI clearance requirements.
Requirements
- Proven experience managing GPU‑accelerated AI infrastructure.
- Strong knowledge of Docker, Kubernetes, and container networking.
- Hands‑on experience with InfiniBand and high‑performance networking.
- Familiarity with secure deployment practices and classified environment standards.
- Excellent problem‑solving skills and ability to work in a fast‑paced, mission‑critical setting.