onsite
Staff Machine Learning Infrastructure Engineer, Search & Discovery - Coupand
ML Engineer
Lead the design and scaling of machine learning infrastructure for search and discovery, building robust pipelines and services using Python, TensorFlow, Kubernetes, AWS, and Spark.
About the role
Key Responsibilities
- Architect, develop, and maintain large‑scale ML platforms that power search and discovery across the e‑commerce ecosystem.
- Design end‑to‑end data pipelines and model training workflows using Spark and TensorFlow, ensuring high throughput and low latency.
- Deploy, orchestrate, and monitor containerized ML services on Kubernetes clusters in AWS, implementing autoscaling and fault‑tolerance.
- Collaborate with data scientists, product managers, and SRE teams to translate research prototypes into production‑ready systems.
- Establish best practices for model versioning, reproducibility, and continuous integration/continuous deployment (CI/CD) of ML models.
Requirements
- 5+ years of experience building production ML infrastructure, preferably in a high‑traffic e‑commerce or search environment.
- Strong proficiency in Python and deep learning frameworks such as TensorFlow or PyTorch.
- Hands‑on experience with Kubernetes, Docker, and cloud services (AWS, GCP, or Azure) for large‑scale deployments.
- Expertise in distributed data processing using Spark or similar technologies.
- Solid understanding of software engineering principles, CI/CD pipelines, and monitoring/observability tools.
Skills
pythontensorflowkubernetesaws