remote
ML Ops and Model Accuracy Engineer - Capgemini
Systems Engineer
Design and operate scalable, high‑availability backend platforms for machine‑learning workloads, applying strong engineering practices, automated testing, and continuous delivery to ensure model accuracy and performance.
About the role
Key Responsibilities
- Design and build robust, scalable backend services and platform components optimized for ML training and inference workloads.
- Implement CI/CD pipelines, automated testing, and code‑quality standards to guarantee reliability and rapid delivery.
- Deploy, monitor, and manage containerized ML workloads using Docker and Kubernetes on cloud environments such as AWS.
- Integrate model‑tracking and experiment‑management tools (e.g., MLflow) to maintain model versioning, reproducibility, and accuracy monitoring.
- Collaborate with data scientists and software engineers to translate model requirements into production‑ready services.
Requirements
- Strong proficiency in Python and experience building production‑grade services.
- Hands‑on experience with containerization (Docker) and orchestration (Kubernetes) in cloud platforms, preferably AWS.
- Solid understanding of CI/CD concepts and tools (Jenkins, GitLab CI, GitHub Actions).
- Familiarity with ML lifecycle management tools such as MLflow, TensorFlow Serving, or similar.
- Demonstrated ability to enforce engineering best practices, including modular design, automated testing, and performance optimization.
Skills
pythondockerkubernetesawscicdmlflow