remote
Director AI Operations - project44
Systems Engineer
Lead AI operations for a global supply‑chain platform, driving end‑to‑end ML model deployment, data pipeline automation, and cloud infrastructure scaling to deliver real‑time, AI‑powered logistics insights.
About the role
Key Responsibilities
- Architect and oversee the end‑to‑end lifecycle of machine learning models, from data ingestion to production deployment, ensuring high availability and low latency for real‑time supply‑chain insights.
- Lead a cross‑functional team of data scientists, ML engineers, and DevOps specialists to build scalable data pipelines and model serving infrastructure on AWS.
- Implement CI/CD pipelines, containerization (Docker, Kubernetes), and automated testing to accelerate model rollouts while maintaining rigorous quality standards.
- Collaborate with product, analytics, and operations stakeholders to translate business requirements into technical solutions and prioritize feature development.
- Monitor system performance, troubleshoot issues, and continuously optimize resource utilization and cost efficiency across cloud environments.
Requirements
- 10+ years of experience in AI/ML operations, with a proven track record of scaling production ML systems in a high‑volume, real‑time environment.
- Deep expertise in Python, AWS services (SageMaker, Lambda, ECS/EKS), and container orchestration (Kubernetes).
- Strong background in data engineering, ETL pipelines, and data lake architecture.
- Hands‑on experience with CI/CD, automated testing, and monitoring tools (Prometheus, Grafana, CloudWatch).
- Excellent leadership, communication, and stakeholder‑management skills.
Skills
machine learningpythonawskubernetes