remote
Staff MLOps Engineer AI/ML Platform
MLOps Engineer
Lead the design, implementation, and scaling of AI/ML platform services on AWS, leveraging EKS, Spark, and Databricks to deliver robust, production‑grade MLOps pipelines and infrastructure.
About the role
Key Responsibilities
- Architect and build end‑to‑end MLOps pipelines on AWS, integrating Apache Spark batch jobs and Databricks notebooks.
- Design, deploy, and manage containerized workloads using Amazon EKS and Kubernetes, ensuring high availability and scalability.
- Implement CI/CD automation for model training, validation, and deployment, incorporating best practices for versioning, testing, and monitoring.
- Collaborate with data scientists and software engineers to translate research prototypes into production‑ready services.
- Establish observability, logging, and alerting frameworks to maintain platform reliability and performance.
Requirements
- 5+ years of experience building and operating large‑scale ML platforms on AWS.
- Strong expertise with Amazon EKS/Kubernetes, Apache Spark, and Databricks.
- Proficiency in CI/CD tools (e.g., Jenkins, GitLab CI, GitHub Actions) and infrastructure‑as‑code (Terraform, CloudFormation).
- Solid programming skills in Python or Scala for data processing and model orchestration.
- Demonstrated ability to implement MLOps best practices, including model registry, monitoring, and automated rollback.
Skills
awsapache sparkdatabricksmlopskubernetescicd