About Crunchyroll
Founded by fans, Crunchyroll delivers the art and culture of anime to a passionate community. We super-serve over 100 million anime and manga fans across 200+ countries and territories, and help them connect with the stories and characters they crave. Whether that experience is online or in-person, streaming video, theatrical, games, merchandise, events and more, it’s powered by the anime content we all love.
Join our team, and help us shape the future of anime!
Staff Software Engineer owns the design, delivery, and operationalization of AI/ML platform services, with a focus on LLM and generative AI–driven workflows, that enable teams across Enterprise Technology to productionize model-driven automation and agent style workflows. You will define how these systems operate in production by establishing core platform capabilities, operational standards, and system level patterns that ensure solutions are reliable, scalable, and cost efficient. This role focuses on intelligent systems engineering, MLOps, workflow orchestration, observability, governance, and building scalable systems that optimize complex, multi-step business workflows.
What You’ll Do:
- Define and deliver platform capabilities that enable end-to-end automation of complex business workflows using model driven and LLM powered systems.
- Design and implement scalable architectures for processing, enriching, and transforming large volumes of structured and unstructured data across multi-step workflows.
- Establish patterns for orchestrating model driven workflows, including sequencing, dependency management, retries, and failure handling across distributed systems.
- Design and implement patterns for integrating and orchestrating large language models and generative AI services, including prompt design, evaluation, and optimization for production workflows.
- Define model serving strategies and system architectures that balance latency, throughput, accuracy, and cost across batch and real-time workloads.
- Implement evaluation and validation frameworks to ensure consistent quality, reliability, and performance of model driven systems in production.
- Define and enforce data quality, lineage, and auditability standards to ensure traceability and reproducibility of outputs.
- Lead system design and delivery across cross functional stakeholders, driving alignment between engineering, data, and business teams.
What We’re Looking For:
- 12+ years of software engineering experience with significant experience building and operating large-scale distributed systems.
- Experience designing and delivering AI/ML or model driven systems in production environments.
- Hands on experience working with LLMs and generative AI systems, including prompt design, evaluation, and production integration.
- Strong experience in Python and building backend services using modern languages such as TypeScript/Node.
- Experience with cloud platforms (GCP or AWS), Kubernetes, and infrastructure as code.
- Experience building and operating data pipelines and workflow orchestration systems (batch and streaming).
- Proven ability to design systems that balance scalability, reliability, and cost efficiency.
- Experience implementing observability, monitoring, and reliability practices for production systems.
- Strong system design skills and ability to lead complex technical initiatives across teams.
Preferred qualifications:
- Experience building AI/ML platforms, MLOps tooling, or shared systems used across multiple teams.
- Experience designing and operating LLM driven or generative AI workflows at scale.
- Familiarity with retrieval/embedding architectures (RAG) and evaluation strategies for model driven systems.
- Experience designing orchestration patterns for automated or semi automated workflows.
- Experience designing systems with strong traceability, explainability, and clear data lineage.
- Experience influencing architecture and technical direction across multiple teams.