onsite
Lead AI Infrastructure Engineer - Staffed4U
Devops Engineer
Senior engineer leading the design, deployment, and operation of scalable AI/ML platforms using Python, AWS, Kubernetes, Docker, and Terraform to deliver robust, secure, and high‑performance infrastructure.
About the role
Key Responsibilities
- Architect and maintain end‑to‑end AI/ML infrastructure, ensuring scalability, reliability, and security across cloud and on‑prem environments.
- Lead the implementation of containerized workloads with Docker and Kubernetes, optimizing resource utilization for GPU‑intensive training and inference pipelines.
- Develop and enforce CI/CD pipelines for model deployment, leveraging Terraform for infrastructure as code and automated testing frameworks.
- Collaborate with data scientists and ML engineers to integrate model training, versioning, and monitoring into the production workflow.
- Oversee performance tuning, cost management, and capacity planning for large‑scale AI workloads.
Requirements
- 10+ years of experience in AI/ML infrastructure engineering, with a proven track record of leading complex projects.
- Expertise in Python, AWS services (SageMaker, ECS, EKS, S3), Kubernetes, Docker, and Terraform.
- Strong background in ML Ops practices, model deployment, and monitoring tools.
- Excellent problem‑solving skills and ability to mentor a multidisciplinary team.
- Security clearance TS/SCI with Polygraph required.
Skills
pythonawskubernetesdockerterraform