remote
AI Platform Engineer - accelerant
Devops Engineer
Lead the design, deployment, and scaling of AI/ML infrastructure on AWS, building robust pipelines with Python, Kubernetes, and Terraform to support global data science teams.
About the role
Key Responsibilities
- Architect and maintain end‑to‑end AI/ML pipelines on AWS, ensuring high availability and scalability.
- Develop and deploy containerized services using Docker and Kubernetes, automating rollouts with CI/CD pipelines.
- Implement infrastructure as code with Terraform, managing resources, security groups, and IAM roles.
- Collaborate with data scientists to optimize model training, inference, and monitoring workflows.
- Monitor system performance, troubleshoot issues, and continuously improve reliability and cost efficiency.
Requirements
- 5+ years of experience in cloud engineering, with a focus on AI/ML workloads.
- Proficiency in Python, AWS services (SageMaker, ECS, EKS, S3, Lambda), and container orchestration.
- Hands‑on experience with Terraform, CI/CD tools (GitHub Actions, Jenkins), and monitoring solutions.
- Strong understanding of ML model lifecycle, data pipelines, and performance tuning.
- Excellent problem‑solving skills and ability to work in a fast‑paced, global team.
Skills
pythonawsmachine learningkubernetesdockerterraformcicd