onsite
Staff Engineer, Machine Learning - Graphcore
ML Engineer
Senior ML engineer responsible for validating and benchmarking Graphcore's AI accelerator stack, building automated test pipelines, and diagnosing performance, precision, and correctness issues across modern frameworks and distributed execution environments.
About the role
Key Responsibilities
- Design and implement automated benchmarking pipelines for open‑source models running on Graphcore hardware.
- Validate numerical precision, quantisation, attention mechanisms, and other low‑level ML behaviours across TensorFlow, PyTorch, and custom runtimes.
- Identify regressions, correctness bugs, and performance bottlenecks in both single‑node and distributed execution environments.
- Collaborate with hardware, compiler, and software teams to provide clear diagnostics and actionable feedback.
- Develop test harnesses and tooling to continuously assess the AI stack from silicon to framework level.
Requirements
- 5+ years of experience in ML engineering, performance testing, or systems validation.
- Strong proficiency in Python and C++ with hands‑on experience in CUDA and Linux environments.
- Deep knowledge of major ML frameworks such as TensorFlow and PyTorch, including model quantisation and distributed training.
- Proven track record building benchmarking or profiling infrastructure for hardware‑accelerated workloads.
- Excellent problem‑solving skills and ability to communicate complex technical findings to cross‑functional teams.
Skills
pythonctensorflowpytorchcudalinux