hybrid
Staff Machine Learning Platform Engineer
Machine Learning Platform Engineer
As a Staff Machine Learning Platform Engineer at Faire, you will design, improve, and operate a scalable ML platform to accelerate model training, deployment, and governance. You will act as the technical bridge between data science and production engineering, enabling the team to support tens of thousands of local businesses. This role involves building and maintaining ML infrastructure, implementing CI/CD pipelines, and ensuring data governance and platform health.
About the role
About this role
As a Staff Machine Learning Platform Engineer, you will help design, improve, and operate a scalable ML platform to accelerate model training, deployment, and governance. You are the technical bridge between data science and production engineering. You’ll be joining a small but deeply critical team that scales Faire’s ability to support tens of thousands of local businesses in a constantly narrowing retail landscape.
What You Will Do
- Design and operate ML infrastructure, including workspaces, clusters, jobs, and workflows
- Productionize ML workloads using Spark, Delta Lake, MLflow, and Databricks Workflows
- Teach data scientists how to utilize our ML platform to advance development from notebook to production for our most critical models
- Implement Unity Catalog for data governance, lineage, access control, and secure multi-tenant usage
- Build CI/CD pipelines for ML using Terraform and Git-based workflows (e.g., GitHub Actions)
- Optimize performance, reliability, and cost across training and inference workloads
- Configure Identity and Access Management (IAM) and Role Based Authentication Controls (RBAC) for sensitive data sets
- Establish observability for data quality, model performance, and platform health
- Build and maintain ML Platform technical documentation
What it takes
- 8+ years of experience building production ML or data platforms
- A degree (preferably graduate level) in Computer Science, Engineering, Statistics, or a related technical field
- Strong hands-on expertise with Databricks, Spark, Delta Lake, and MLflow.
- Proficiency in Python, SQL, and distributed systems concepts
- Experience with cloud platforms and infrastructure-as-code
- Solid understanding of MLOps best practices: CI/CD, monitoring, reproducibility, and security
- Experience supporting multiple ML teams in a shared platform environment
- Are an active owner of orphaned problems and are willing to assimilate whatever knowledge you’re missing to get the job done
Tech Stack
Faire uses a modern cloud based tech stack. For this role, you’ll want to be proficient with the following:
Languages
ML Frameworks
Big Data & Processing
- Spark
- Kafka
- Databricks
- Snowflake
- Fivetran
- Iceberg
- Unity Catalog
- Datadog
- Airflow
- Cockroach DB
- MySQL
Cloud & Infrastructure
- AWS
- S3
- SageMaker
- Kubernetes
- Docker
- GitHub Actions
- Terraform
Generative AI
- Claude Sonnet 4.5
- ChatGPT 5.2