onsite

Machine Learning Engineer - Goliath Partners

ML Engineer

Machine Learning Engineer focused on optimizing large multimodal and diffusion models for GPU performance, leveraging C++, CUDA, TensorRT, and Triton to drive scalable inference and training across distributed systems.

About the role

Key Responsibilities

Optimize large multimodal and diffusion models for maximum GPU throughput and efficiency.
Implement and tune C++/CUDA kernels, TensorRT engines, and Triton inference pipelines.
Design and maintain distributed training and inference workflows at scale.
Collaborate with research teams to translate cutting‑edge generative models into production‑ready systems.
Continuously benchmark, profile, and iterate on performance bottlenecks.

Requirements

Strong experience in C++ and CUDA programming for high‑performance ML workloads.
Proficiency with TensorRT, Triton, and GPU profiling tools.
Hands‑on knowledge of large‑scale distributed training and inference frameworks.
Background in multimodal or generative AI models is highly desirable.
Excellent problem‑solving skills and a passion for pushing performance limits.

Skills

pythonccuda

CompanyGoliath Partners

DepartmentResearch

LocationNew York, NY, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Salary400,000

Posted June 20, 2026