onsite

ML Platform Engineer - Stitch Fix

Devops Engineer

Build and scale machine‑learning infrastructure on cloud platforms, enabling data scientists to deploy models efficiently using Python, TensorFlow, Kubernetes, and AWS while ensuring robust CI/CD pipelines and big‑data processing with Spark.

About the role

Key Responsibilities

Design, develop, and maintain a scalable ML platform that supports end‑to‑end model training, serving, and monitoring.
Implement infrastructure as code using Kubernetes and AWS services to ensure high availability and cost‑effective scaling.
Build CI/CD pipelines for automated testing, containerization, and deployment of ML workloads.
Collaborate with data scientists to integrate frameworks such as TensorFlow and PyTorch into production pipelines.
Optimize data processing workflows with Spark and manage data pipelines for feature engineering.
Monitor system performance, troubleshoot issues, and continuously improve platform reliability and security.

Requirements

5+ years of experience in software engineering or ML platform development.
Strong proficiency in Python and experience with deep‑learning libraries (TensorFlow, PyTorch).
Hands‑on experience with Kubernetes, Docker, and AWS (EKS, S3, SageMaker, etc.).
Proven ability to build CI/CD pipelines and automate deployments.
Experience processing large datasets using Spark or similar big‑data technologies.

Skills

pythontensorflowkubernetesawscicd

CompanyStitch Fix

DepartmentEngineering

LocationUnited States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 26, 2026