About the role
Crunchyroll is growing and changing, presenting unique challenges and opportunities to support millions of anime fans around the world. The AI/ML team provides seamless help to our internal stakeholders, ensuring an phenomenal experience for all Crunchyroll fans. The AI/ML team relies on strong MLOps practices to ensure models are reliable, scalable, and impactful in production.
- Design, build, and maintain end-to-end ML infrastructure and pipelines to support model training, deployment, and monitoring.
- Develop and manage CI/CD pipelines for ML to enable fast, reliable, and automated delivery of ML models.
- Implement and manage model registry, experiment tracking, and versioning using tools like MLflow, SageMaker Model Registry, or equivalent.
- Establish monitoring, observability, and alerting frameworks to detect drift, degradation, and anomalies in real-time.
- Partner with data scientists to productionize ML models, ensuring seamless transition from research to production.
- Optimize ML workflows for performance, scalability, and cost-effectiveness across training and inference.
- Leverage platforms such as AWS SageMaker, Databricks, Kinesis, Lambda, Kubernetes (EKS), and Docker for ML operations.
- Collaborate with data engineering and software engineering teams to integrate ML services into large-scale distributed systems.
- Drive best practices for MLOps, including reproducibility, governance, compliance, and security of deployed models.
How you’ll work with Data Science
- Partner with ML Engineers to deploy and scale models built with frameworks like PyTorch, TensorFlow, and Scikit-learn.
- Help data scientists track experiments, compare runs, and promote models to production.
- Translate research notebooks into production-grade pipelines with reproducible training and and inference workflows.
- Co-own model lifecycle management: data, training, validation, deployment, monitoring, retraining.
- Ensure ML models align with software engineering best practices for testing, automation, and observability.
About You
We get excited about candidates, like you, because...
- Bachelor’s or Master’s degree in Data Science, Computer Science, Statistics, or a related field.
- 8+ years of experience in MLOps, ML infrastructure, or DevOps for AI/ML systems.
- MLflow, SageMaker, Databricks ML for experiment tracking, model registry, and lifecycle management.
- Airflow, or Step Functions for workflow orchestration.
- MLFLow for monitoring ML models in production.
- Deep knowledge of CI/CD and automation frameworks (GitHub Actions, Terraform, CloudFormation).
- Hands-on experience with containerization (Docker) and orchestration (EKS).
- Proficiency in Python and scripting for ML integrations.
- Strong knowledge of cloud platforms (AWS preferred) and services relevant to ML (SageMaker, Lambda, S3, Kinesis, Step Functions).
- Understanding of security, compliance, and governance in ML production systems.
- Excellent problem-solving and communication skills, with a proven ability to work with cross-functional teams of data scientists.