remote
AI Operations Manager - solvedex
Systems Engineer
Lead AI operations, ensuring seamless deployment, monitoring, and scaling of machine learning models across cloud environments using Python, DevOps practices, and AWS/Kubernetes infrastructure.
About the role
Key Responsibilities
- Design, implement, and maintain end‑to‑end ML model pipelines from training to production.
- Collaborate with data scientists to translate research prototypes into scalable, production‑ready services.
- Automate model deployment, monitoring, and rollback using CI/CD pipelines and container orchestration (Kubernetes).
- Ensure high availability, performance, and security of AI services on AWS.
- Analyze model drift, performance metrics, and trigger retraining workflows.
Requirements
- 5+ years of experience in ML operations or related roles.
- Proficiency in Python, Docker, and Kubernetes.
- Hands‑on experience with AWS services (EKS, SageMaker, Lambda, CloudWatch).
- Strong understanding of CI/CD tools (GitHub Actions, Jenkins, ArgoCD).
- Excellent problem‑solving skills and ability to work cross‑functionally.
Skills
machine learningpythonawskubernetes