onsite
Machine Learning Engineer, LLM Post-Training
Machine Learning Engineer, LLM Post-Training
As a Machine Learning Engineer, LLM Post-Training, you will be responsible for driving the post-training of large language models with a strong focus on reinforcement learning. This involves owning the full post-training stack including continuous pre-training, supervised fine-tuning, and RL, as well as preparing the data that powers these processes. You will also collaborate with product and business teams to translate real-world use cases into training objectives and rapidly implement model improvements.
About the role
About the Role
We are looking for a hands-on Machine Learning Engineer to drive the post-training of our large language models, with a strong emphasis on reinforcement learning (RL). You will own the full post-training stack — continuous pre-training (CPT), supervised fine-tuning (SFT), and RL — along with the data preparation that powers it. Just as important, you will work directly with product and business teams to translate real-world use cases into concrete training objectives and ship model improvements quickly. This is a high-ownership role for someone who has actually trained models, not just read about it.
Responsibilities
- Lead post-training of our LLMs across the full pipeline: continuous pre-training, SFT, and reinforcement learning, with RL as the primary focus (e.g., RLHF, PPO, GRPO, DPO, and related methods).
- Design, build, and curate the data that drives each training stage — instruction/SFT datasets, preference pairs, reward signals, on-policy rollouts, and rejection-sampled completions — and define data-preparation strategies tailored to specific business needs.
- Partner closely with business and product stakeholders to understand their scenarios, rapidly convert requirements into training plans, and deliver targeted model capabilities on tight timelines.
- Run large-scale training on mid-to-large GPU clusters, applying distributed-training techniques (data parallelism, FSDP, and where relevant tensor/pipeline parallelism) and tuning for throughput and stability.
- Build and maintain evaluation and reward/verifier pipelines to measure model quality, prevent regressions, and ensure training–serving consistency.
- Stay current with post-training research and turn promising techniques into working, production-ready code.
Requirements
- Hands-on LLM post-training experience. You have personally run CPT, SFT, and RL training — with demonstrated, practical RL experience (RLHF / PPO / GRPO / DPO or similar), beyond just launching training scripts.
- Strong data engineering for ML. You can independently design data-preparation plans for a given business scenario — sourcing, cleaning, filtering, labeling strategy, and synthetic/preference data generation — to meet specific product requirements.
- Proven large-scale GPU training ability. You have trained LLMs on mid-to-large GPU hardware and are comfortable with distributed training and debugging at scale.
- Strong PyTorch fundamentals; working familiarity with frameworks such as Hugging Face TRL/Accelerate, DeepSpeed or FSDP, and inference engines like vLLM.
- Solid understanding of tokenization, attention, chat templates, and common failure modes in alignment/agent training.
- A bias toward fast iteration and business impact, with strong communication skills to work across research and product teams.
Preferred Qualifications
- Experience designing reward models or rule-based verifiers for RL.
- Experience with tool-use / agentic model training (function calling, multi-step planning).
- Publications or open-source contributions in LLM post-training or RL.