Software Engineer (Large Scale Training)
Lightricks is seeking a Software Engineer to join their ML team, focusing on building and maintaining large-scale model training systems. The role involves optimizing distributed training frameworks, data pipelines, and developer experience for researchers. The ideal candidate will be a strong engineer passionate about solving complex systems problems and performance challenges.
This is a software engineering role on an ML team. You'll own the systems that make large-scale model training fast, reliable, and pleasant to work with, the distributed training framework, the data pipelines feeding it, the performance characteristics of every step on the critical path, and the day-to-day developer experience for the researchers who depend on it.
You don't need to come in as an ML expert. You do need to be a strong engineer who gets excited about hard systems problems: squeezing throughput out of accelerator clusters, hunting down stragglers across hundreds of machines, designing abstractions that hold up as the codebase grows, and making the unglamorous parts of training infrastructure work well.
If you've ever looked at a large-scale system and thought "there's no reason this should take this slow / inefficient / hard to maintain / complex," this role is built for you.
* ML training experience is a bonus. If you have it, great, but we'd rather hire a strong systems engineer who's curious about ML than an ML engineer who's lukewarm about infrastructure.
Posted June 2, 2026