onsite

Staff Machine Learning Infrastructure Engineer, Search & Discovery - Coupand

ML Engineer

Lead the design and scaling of machine learning infrastructure for search and discovery, building robust pipelines and services using Python, TensorFlow, Kubernetes, AWS, and Spark.

About the role

Key Responsibilities

Architect, develop, and maintain large‑scale ML platforms that power search and discovery across the e‑commerce ecosystem.
Design end‑to‑end data pipelines and model training workflows using Spark and TensorFlow, ensuring high throughput and low latency.
Deploy, orchestrate, and monitor containerized ML services on Kubernetes clusters in AWS, implementing autoscaling and fault‑tolerance.
Collaborate with data scientists, product managers, and SRE teams to translate research prototypes into production‑ready systems.
Establish best practices for model versioning, reproducibility, and continuous integration/continuous deployment (CI/CD) of ML models.

Requirements

5+ years of experience building production ML infrastructure, preferably in a high‑traffic e‑commerce or search environment.
Strong proficiency in Python and deep learning frameworks such as TensorFlow or PyTorch.
Hands‑on experience with Kubernetes, Docker, and cloud services (AWS, GCP, or Azure) for large‑scale deployments.
Expertise in distributed data processing using Spark or similar technologies.
Solid understanding of software engineering principles, CI/CD pipelines, and monitoring/observability tools.

Skills

pythontensorflowkubernetesaws

CompanyCoupand

DepartmentResearch

LocationMountain View, United States

Experience7+ years

Tenurefull-time

LevelLead

Posted June 26, 2026