About the Opportunity
Join our innovative AI Research and Development team as an AI/ML Infrastructure Engineer. In this role, you'll collaborate with world-class scientists and engineers to design, deploy, and maintain a state-of-the-art Linux-based infrastructure. You'll be at the cutting edge of machine learning and artificial intelligence, contributing to projects in computer vision, natural language processing, large language models, and more. Be a part of shaping the future of AI in the defense industry!
Responsibilities
- Design, procure, build, implement, and maintain on-prem servers, workstations, and software tooling across multiple development environments.
- Implement infrastructure components to support distributed compute, storage, and dataset management.
- Ensure system compliance with security requirements, including system updates and developing a System Security Plan (SSP).
- Manage access and integration with commercial cloud providers such as AWS.
- Design, implement, and maintain hybrid cloud architectures connecting on-premises environments with cloud infrastructure.
- Develop and support cross-domain solutions for secure data transfer between networks of varying classification levels.
- Configure and manage networking components (VPNs, Direct Connect, gateways, firewalls) to support hybrid connectivity.
- Collaborate with cybersecurity, infrastructure, and application teams to ensure secure, scalable, and resilient cloud integrations.
- Monitor and optimize cloud connectivity performance, availability, and cost efficiency.
- Support accreditation and compliance activities related to cloud and cross-domain architectures.
- Coordinate with data scientists and software engineers to understand and plan infrastructure requirements.
- Support other hardware, such as edge devices, as required by projects/customers.
Required Qualifications
- Ability to obtain a Top Secret or Top-Secret SCI clearance.
- Bachelor's degree or equivalent experience in lieu of degree.
- Minimum 5 years of Linux systems administration experience.
- Expertise in scripting languages such as Bash, Python, and/or Ansible for automation and orchestration of Linux systems.
- Knowledge of Docker and Kubernetes, including deployment, scaling, and management of containerized applications.
- Experience with on-prem servers, including hardware selection, deployment, maintenance, and troubleshooting.
- Networking skills, including knowledge of network protocols, routing, and switching.
Desired Qualifications
- Experience managing a GPU-enabled compute cluster.
- Experience managing a FIPS-enabled Linux environment.
- Experience with high availability data solutions such as Ceph.
- Experience with cross-domain/hybrid solutions.
- Experience with DevSecOps tools such as CI/CD pipelines and logging and monitoring tools.
- Experience with virtualization, including VMware or KVM.
- Understanding of AI/ML concepts and the AI/ML development lifecycle.
- Security+ Certification.
- Experience working in environments with high regulatory and compliance requirements.
- Active and current Top Secret or TS-SCI clearance.
- Experience working with hardware in classified environments.
- Experience working on proposals.