onsite
Staff Machine Learning Engineer - VLM/LLM Evaluation - Waymo
ML Engineer
Lead the design and implementation of evaluation frameworks for vision‑language and large language models, driving metrics, data pipelines, and scalable infrastructure to improve autonomous driving perception and decision systems.
About the role
Key Responsibilities
- Design and build robust evaluation pipelines for Vision‑Language Models (VLMs) and Large Language Models (LLMs) used in perception and planning stacks.
- Develop and maintain metrics, benchmarks, and automated testing suites to assess model performance, safety, and reliability at scale.
- Collaborate with research, simulation, and product teams to integrate evaluation results into the continuous improvement loop of the autonomous driving system.
- Implement distributed training and inference workflows using Python, TensorFlow, and PyTorch on large‑scale compute clusters.
- Analyze failure cases, generate insights, and propose model or data enhancements to meet safety and accuracy targets.
Requirements
- Ph.D. or Master’s in Computer Science, Electrical Engineering, or related field with 7+ years of hands‑on ML experience.
- Deep expertise in LLMs, VLMs, and modern deep‑learning frameworks (TensorFlow, PyTorch).
- Proven track record building large‑scale evaluation infrastructure, metrics, and data pipelines.
- Strong programming skills in Python and experience with distributed computing platforms (e.g., Kubernetes, Ray, Spark).
- Excellent problem‑solving ability and communication skills to work across cross‑functional teams.
Skills
pythonmachine learningtensorflowpytorch