Senior Machine Learning Engineer - Training Platform
As a Senior Machine Learning Engineer, you will design, scale, and mature systems and infrastructure for AI training workloads on a Kubernetes-based platform. You will improve reliability, scalability, and usability, collaborating with various teams to shape the platform's roadmap and enable AI-powered experiences at scale.
We’re part of the Training Platform team within Canva’s AI Platform group, which sits in the Generative AI supergroup. Our team is responsible for the systems that power model training at scale, building the foundations that enable teams across Canva to create, train, and scale AI-powered experiences.
Our focus is on building reliable, efficient, and developer-friendly training infrastructure — from orchestration and distributed training systems to experimentation and platform capabilities that support large-scale AI workloads.
We enable teams across Canva to push the boundaries of what’s possible with AI.
As a Senior Machine Learning Engineer, you’ll focus on designing, scaling, and maturing the systems and infrastructure that support training workloads across Canva. You’ll work on a Kubernetes-based training platform that enables distributed AI workloads across a wide range of teams, frameworks, and use cases, while also contributing to the surrounding platform capabilities that support the end-to-end training lifecycle — such as experiment management, artifact management, and other core systems needed to run AI workloads reliably and at scale. You’ll help evolve these capabilities over time, improving their reliability, scalability, usability, and overall platform maturity.
You’ll collaborate closely with research scientists, AI engineers, product teams, and cloud/infrastructure teams to ensure workloads can run efficiently, reproducibly, and reliably at scale. You’ll also help shape the roadmap for the platform by understanding user pain points, improving platform capabilities, and contributing to the long-term direction of Canva’s training infrastructure.
This role is ideal for someone who enjoys working on the systems behind AI — not just the models themselves — and wants to have broad impact across multiple teams.
You’re an engineer who loves building the systems that power AI at scale. You have strong experience in training pipelines, distributed systems, or large-scale AI infrastructure, and you’re excited by the challenge of making training workloads more reliable, scalable, and efficient.
You bring strong experience working with Kubernetes and containerized workloads. Experience with training infrastructure, or distributed frameworks such as Ray, PyTorch distributed training, or similar technologies will be highly valuable.
You’re also familiar with the modern cloud and infrastructure services that underpin high-performance AI workloads — for example, high-performance storage, HPC environments, fast interconnects and networking capabilities, or services such as FSx, EFA, and related infrastructure commonly used in large-scale training environments.
You bring a strong sense of ownership and enjoy working on complex, cross-cutting problems that impact multiple teams. You’re comfortable collaborating with engineers, applied scientists, and infrastructure partners, and you care deeply about scalability, reliability, usability, and developer experience. Most importantly, you’re motivated by the opportunity to help Canva build the platform foundations that enable AI-powered creativity at scale.
Posted June 7, 2026