Machine Learning Engineer, Pegasus
As a Machine Learning Engineer on the Pegasus team, you will be responsible for building, improving, and operating production-level ML systems for video understanding. You will contribute across the entire ML stack, developing VLM serving systems and multimodal data pipelines, while driving technical decisions and leveraging AI development tools to enhance productivity.
Pegasus is a Video Understanding model created to solve the problem of having abundant video data but insufficient actionable data. Traditional Video AI often stops at summarizing videos or answering questions. However, actual business environments require more than simple summaries. Enterprises need to know when specific scenes appear, when events occur, and how to connect this information to search, classification, archiving, and editing systems.
Pegasus addresses these needs by comprehensively understanding a video's visual, audio, and on-screen text, then transforming it into time-based, structured data. Its key feature, Segment, allows customers to define desired segment types and metadata schemas, with Pegasus identifying the start and end times of relevant scenes and returning structured information like titles, summaries, people, topics, visual elements, and domain-specific labels.
The core of Pegasus is not merely to understand video content but to transform video into a data system immediately usable in production environments and business workflows. Beyond a simple Video LLM, Pegasus serves as foundational infrastructure for various areas, including Search, Archiving, Compliance, CMS, and content operation automation.
The Pegasus team is central to TwelveLabs' video understanding capabilities, developing Pegasus, the company's core Video Analysis product. The team focuses on building multimodal video analysis systems capable of high instruction-following performance and generating complex hierarchical outputs. They prioritize quickly delivering products that provide real value to users, rather than solely focusing on research. To achieve this, ML researchers and engineers collaborate closely in a goal-oriented, cross-functional team.
The team's work encompasses a wide range of engineering challenges, including building learning infrastructure from pre-training to RL for multimodal LLM development, developing production-level time-based segmentation and structured metadata generation systems, designing large-scale inference systems capable of processing hours of video from a single request, and establishing data curation and evaluation pipelines for rapid iteration and model quality improvement.
The team also leverages world-class AI computing infrastructure, including NVIDIA B300, to push the boundaries of video analysis systems, aiming to accelerate the cycle from research to production as quickly as possible.
Posted June 10, 2026