remote
AI Infrastructure Engineer - Escalent
Devops Engineer
Design, build, and maintain scalable AI/ML infrastructure on cloud platforms using Python, Kubernetes, Docker, Terraform, and AWS to support data analytics and model deployment pipelines.
About the role
Key Responsibilities
- Architect and implement robust, scalable AI/ML infrastructure on AWS, leveraging Kubernetes and Docker for container orchestration.
- Develop and maintain IaC (Infrastructure as Code) using Terraform to provision and manage cloud resources.
- Build automated CI/CD pipelines for model training, testing, and deployment, ensuring reproducibility and rapid iteration.
- Collaborate with data scientists and analytics teams to optimize data pipelines, storage, and compute resources for large‑scale model workloads.
- Monitor system performance, implement observability tools, and troubleshoot infrastructure issues to maintain high availability and cost efficiency.
Requirements
- Strong experience with Python for scripting, automation, and integration of AI workflows.
- Proficiency in Kubernetes and Docker for containerized application deployment.
- Hands‑on expertise with Terraform and AWS services (EC2, S3, EKS, Lambda, etc.).
- Solid understanding of machine learning lifecycle and experience supporting model training and serving pipelines.
- Ability to work cross‑functionally, communicate technical concepts clearly, and drive infrastructure best practices.
Skills
pythonkubernetesdockerterraformawsmachine learning