remote
Advisory AI Infrastructure Engineer - Lenovo
Devops Engineer
Lead the design and deployment of scalable AI infrastructure, leveraging Kubernetes, Docker, and AWS to support machine learning workloads on GPU clusters. Drive performance, reliability, and cost‑efficiency for enterprise AI solutions.
About the role
Key Responsibilities
- Architect and maintain production‑grade AI platforms using Kubernetes, Docker, and AWS services.
- Design GPU‑accelerated pipelines for training and inference of machine learning models.
- Collaborate with data scientists and software engineers to optimize model performance and resource utilization.
- Implement CI/CD workflows, monitoring, and automated scaling for AI workloads.
- Ensure security, compliance, and cost‑management across cloud and on‑prem environments.
Requirements
- 5+ years of experience in AI/ML infrastructure engineering.
- Proficiency with Kubernetes, Docker, and AWS (EKS, S3, EC2, SageMaker).
- Strong scripting skills in Python and experience with GPU programming (CUDA, cuDNN).
- Knowledge of CI/CD, monitoring, and observability tools (Prometheus, Grafana, ELK).
- Excellent problem‑solving skills and ability to work cross‑functionally in a fast‑paced environment.
Skills
pythonkubernetesdockerawsmachine learning