onsite
Machine Learning Engineer - Goliath Partners
ML Engineer
Machine Learning Engineer focused on optimizing large multimodal and diffusion models for GPU performance, leveraging C++, CUDA, TensorRT, and Triton to drive scalable inference and training across distributed systems.
About the role
Key Responsibilities
- Optimize large multimodal and diffusion models for maximum GPU throughput and efficiency.
- Implement and tune C++/CUDA kernels, TensorRT engines, and Triton inference pipelines.
- Design and maintain distributed training and inference workflows at scale.
- Collaborate with research teams to translate cutting‑edge generative models into production‑ready systems.
- Continuously benchmark, profile, and iterate on performance bottlenecks.
Requirements
- Strong experience in C++ and CUDA programming for high‑performance ML workloads.
- Proficiency with TensorRT, Triton, and GPU profiling tools.
- Hands‑on knowledge of large‑scale distributed training and inference frameworks.
- Background in multimodal or generative AI models is highly desirable.
- Excellent problem‑solving skills and a passion for pushing performance limits.