onsite
Machine Learning Lead, Evaluation
Machine Learning Lead, Evaluation
Waymo is looking for a Machine Learning Lead to guide the development of ML-based evaluation metrics and systems for autonomous driving technology. This role involves architecting scalable systems, implementing novel RL algorithms, and leading the application of Deep Learning and Generative AI to enhance the Waymo Driver's performance.
About the role
About the Role
The DUE Machine Learning team at Waymo focuses on building and operating scalable machine learning and data systems, enhancing simulation workflow and insight tools, and accelerating evaluation and onboard developer journeys. This team integrates expert human judgments with advanced machine learning models to provide training and evaluation data for a multitude of metrics and components that constitute the Waymo Driver. We are seeking researchers and software engineers passionate about developing machine learning techniques for our autonomous service's Evaluation systems, who will continuously drive performance improvements in our technology stack.
Responsibilities
- Grow the end-to-end strategy for Waymo's next generation of machine learning-based evaluation metrics, promoting scientific and statistical rigor across embodied AI applications.
- Architect and build scalable systems for training and fine-tuning large-scale generative models to produce realistic and evaluate interesting driving behaviors.
- Lead the design, implementation, and iteration of novel RL algorithms, reward functions, and training paradigms tailored for generating high-fidelity and insightful driving behaviors.
- Lead the development of cutting-edge Deep Learning models and Generative AI (LLM/VLM) solutions to enhance human-led triaging, introduce automation for high-volume workflows, and perform nuanced analysis of self-driving behavior to detect critical anomalies.
- Proactively monitor and assimilate best practices from within Alphabet and the broader industry to develop a novel Reinforcement Learning from Human Preference (RLHF) based data collection and evaluation system.
- Provide technical mentorship, guidance, and thought leadership to other engineers within the team and across collaborating groups.
- Guide and align multiple teams, including Driver Understanding, Simulation, System Engineering, Research, and Onboard Software, on a cohesive evaluation strategy, ensuring cross-functional alignment on goals and priorities.
Requirements
- PhD degree in Computer Science, Machine Learning, Artificial Intelligence, or a related technical field, or equivalent practical experience.
- 10+ years of hands-on experience in developing and applying Machine Learning models, with a significant focus on Reinforcement Learning.
- 2+ years of people management experience.
- Demonstrated expertise in deep learning, sequence modeling, and generative models.
- Strong publication record or history of impactful project delivery in RL or related areas.
- Proficiency in Python and standard ML frameworks (e.g., JAX, TensorFlow).
- Experience with large-scale distributed training and data processing.
- Proven ability to lead complex and ambiguous technical projects from conception to completion.
Preferred Qualifications
- 12+ years of relevant experience in ML/RL research and application.
- Experience in the autonomous vehicles domain, robotics, or complex simulation environments.
- Deep understanding of state-of-the-art RL techniques, including those used for fine-tuning large models (e.g., from human feedback/preferences).
- Familiarity with large-scale simulation platforms and their integration with ML training workflows.
- Experience designing and using metrics for evaluating complex AI systems.
- Track record of technical leadership, influencing senior stakeholders, and driving innovation across team boundaries.
- Excellent communication skills, with the ability to articulate complex technical concepts clearly.