remote
Machine Learning Infrastructure Engineer
ML Infrastructure Engineer
ML Infrastructure Engineer to design and maintain scalable systems for training and deploying machine learning models in robotic applications.
About the role
Key Responsibilities
- Build and maintain scalable infrastructure for machine learning model training and deployment
- Develop CI/CD pipelines for ML model versioning and testing
- Optimize GPU/TPU resource allocation for training workloads
- Collaborate with data scientists to streamline model deployment workflows
- Monitor and troubleshoot infrastructure performance and reliability
Requirements
- 3+ years of experience in ML infrastructure or related roles
- Proficiency in containerization (Docker) and orchestration (Kubernetes)
- Experience with MLOps tools (MLflow, Kubeflow, or similar)
- Strong scripting skills (Python, Bash) and cloud platform knowledge (AWS/GCP)
- Familiarity with distributed computing and GPU acceleration
Skills
mlopstensorflowkubernetesdockerci cd pipelinescloud platforms