remote
AI Platform Engineer - KUNGFU.AI
Devops Engineer
Lead the design, deployment, and scaling of production‑grade AI platforms using Python, ML frameworks, and cloud services such as AWS, Docker, and Kubernetes, ensuring robust CI/CD pipelines and high‑availability for enterprise clients.
About the role
Key Responsibilities
- Architect and maintain scalable AI platform infrastructure on AWS, integrating services like SageMaker, Lambda, and ECS.
- Build and optimize containerized ML pipelines with Docker and Kubernetes, ensuring high availability and fault tolerance.
- Implement CI/CD workflows using GitHub Actions, Jenkins, or ArgoCD to automate model training, testing, and deployment.
- Collaborate with ML Engineers and Data Scientists to translate research prototypes into production‑ready services.
- Monitor system performance, troubleshoot bottlenecks, and apply best practices for security and compliance.
Requirements
- 3+ years of experience in cloud‑native platform engineering, preferably in AI/ML contexts.
- Proficiency in Python, Docker, Kubernetes, and AWS services (SageMaker, ECS, EKS, CloudWatch).
- Hands‑on experience with CI/CD pipelines and automated testing for ML workflows.
- Strong problem‑solving skills and ability to work cross‑functionally in a fast‑paced environment.
- Excellent communication skills and a passion for delivering measurable business impact through technology.
Skills
pythonmachine learningawsdockerkubernetescicd