onsite

Research Engineer, Training & Inference

Harmonic is seeking a Research Engineer focused on Training & Inference to optimize their proprietary reinforcement learning stack. This role involves end-to-end ownership, from low-level environment simulators to distributed training and inference engines, with a strong emphasis on maximizing throughput and performance for foundation model workloads.

About the role

About the Role

We are developing reinforcement learning systems at a scale where standard abstractions frequently fail. Unlike labs that operate primarily through high-level wrappers, we own the entirety of our RL stack. This ownership spans from low-level environment simulators and custom communication primitives to our distributed training loops and inference engines. We are seeking engineers who view existing libraries as a baseline and the hardware speed itself as the true target. You will be responsible for the architecture powering our agents, with a relentless focus on maximizing the throughput of our reinforcement learning and production workflows.

Key Responsibilities

Total Stack Ownership: Maintain and optimize our proprietary RL training and serving infrastructure. You have the authority to refactor any layer—from the Python API down to the CUDA kernels—to achieve peak performance for foundation model workloads.
Optimized Training: maximize the throughput of our reinforcement learning system from data generation to model training with sharded multi-node training and inference algorithms.
High-Performance Serving: optimize our inference stack for high-throughput reinforcement learning and low-latency LLM production traffic. Tune the inference engine, router, and scheduler, down to custom kernels if need be.
Compute Optimization: Identify and resolve performance bottlenecks within our distributed clusters, ensuring optimal throughput and memory efficiency for multi-billion parameter models, balancing memory constraints with compute-heavy training cycles.

Minimum Qualifications

BS in Computer Science or a related technical field, or equivalent industry experience
2+ years of relevant, hands-on industry experience
Proficiency in Python
Experience building or maintaining components within ML frameworks (e.g., PyTorch, JAX, or TensorFlow).
Proficiency in either:
- Understanding of distributed training concepts and collective communication primitives (e.g., NCCL).
- OR
- Practical experience deploying and profiling models on GPU-accelerated cloud infrastructure.

Preferred Qualifications

MS or PhD in Computer Science, Mathematics, or a related field.
5+ years of relevant, hands-on industry experience
Proficiency in C++
Experience writing or improving kernels (Triton, CuTeDSL, TileLang, CUDA, CUTLASS, ThunderKittens) to resolve low-level bottlenecks.
Proven success deploying performant inference at scale using open-source or custom inference engines, routers, etc.
Direct experience scaling models via FSDP, Tensor Parallelism, or related sharding techniques on multi-node GPU clusters.
Experience designing reinforcement learning systems for high-throughput training and asynchronous data sampling.

About the role

About the Role

Key Responsibilities

Total Stack Ownership: Maintain and optimize our proprietary RL training and serving infrastructure. You have the authority to refactor any layer—from the Python API down to the CUDA kernels—to achieve peak performance for foundation model workloads.
Optimized Training: maximize the throughput of our reinforcement learning system from data generation to model training with sharded multi-node training and inference algorithms.
High-Performance Serving: optimize our inference stack for high-throughput reinforcement learning and low-latency LLM production traffic. Tune the inference engine, router, and scheduler, down to custom kernels if need be.
Compute Optimization: Identify and resolve performance bottlenecks within our distributed clusters, ensuring optimal throughput and memory efficiency for multi-billion parameter models, balancing memory constraints with compute-heavy training cycles.

Minimum Qualifications

BS in Computer Science or a related technical field, or equivalent industry experience
2+ years of relevant, hands-on industry experience
Proficiency in Python
Experience building or maintaining components within ML frameworks (e.g., PyTorch, JAX, or TensorFlow).
Proficiency in either:
- Understanding of distributed training concepts and collective communication primitives (e.g., NCCL).
- OR
- Practical experience deploying and profiling models on GPU-accelerated cloud infrastructure.

Preferred Qualifications

MS or PhD in Computer Science, Mathematics, or a related field.
5+ years of relevant, hands-on industry experience
Proficiency in C++
Experience writing or improving kernels (Triton, CuTeDSL, TileLang, CUDA, CUTLASS, ThunderKittens) to resolve low-level bottlenecks.
Proven success deploying performant inference at scale using open-source or custom inference engines, routers, etc.
Direct experience scaling models via FSDP, Tensor Parallelism, or related sharding techniques on multi-node GPU clusters.
Experience designing reinforcement learning systems for high-throughput training and asynchronous data sampling.

Research Engineer, Training & Inference

About the role

About the Role

Key Responsibilities

Minimum Qualifications

Preferred Qualifications

Research Engineer, Training & Inference

About the role

About the Role

Key Responsibilities

Minimum Qualifications

Preferred Qualifications

Skills