onsite
AI Lead - Vision Language Action for Robotics - Doozy Robotics Pte. Ltd.
Software Engineer
Lead the design and implementation of Vision‑Language‑Action models for humanoid robots, overseeing multimodal architecture, training pipelines, data strategy, and edge deployment using Python and deep‑learning frameworks.
About the role
Key Responsibilities
- Define the end‑to‑end AI architecture for Vision‑Language‑Action systems that enable robots to interpret instructions, perceive scenes, and execute actions.
- Develop and integrate multimodal models that fuse visual, linguistic, and motor data using frameworks such as PyTorch.
- Lead training pipelines for imitation learning, behavior cloning, reinforcement learning, and large‑scale foundation models.
- Design and manage data collection, annotation, and preprocessing pipelines to support continuous fleet learning.
- Optimize model inference and deployment on edge robotics hardware, ensuring real‑time performance and reliability.
- Collaborate closely with perception, control, and simulation teams to align AI solutions with overall robot system requirements.
Requirements
- MS or PhD in Computer Science, Robotics, Machine Learning, or a related field.
- 5+ years of hands‑on experience building and deploying deep‑learning models for robotics or autonomous systems.
- Strong expertise in computer vision, natural language processing, and reinforcement/imitation learning techniques.
- Proficiency in Python and deep‑learning libraries such as PyTorch or TensorFlow, with experience in ROS or similar robotics middleware.
- Demonstrated ability to lead technical teams, define architecture, and deliver production‑ready AI solutions on edge devices.
Skills
pythonpytorchreinforcement learningcomputer vision