onsite
LLM Engineer Reinforcement Learning - 42dot
LLM Engineer
Lead the design and optimization of large language model training pipelines, applying advanced RL techniques such as PPO, GRPO, and DPO to deliver high‑quality, self‑improving generative AI services.
About the role
Key Responsibilities
- Design and implement efficient LLM training pipelines for production use.
- Apply Direct Alignment algorithms (PPO, GRPO, DPO) to improve training efficiency and model alignment.
- Enhance generation accuracy and stability through reward shaping and self‑refinement mechanisms.
- Develop foundational models that integrate external knowledge and APIs, enabling dynamic tool selection based on user prompts.
- Troubleshoot GPU‑accelerated training and scale workloads using distributed training frameworks.
Requirements
- 3+ years of experience in deep learning or NLP (master’s candidates welcome).
- Proficient in Python and PyTorch for model design, training, evaluation, and optimization.
- Hands‑on experience with large‑scale LLM training on GPU clusters and distributed systems.
- Strong understanding of reinforcement learning methods for language models.
- Excellent problem‑solving skills and ability to iterate rapidly on new methodologies.
Skills
pythonpytorchreinforcement learningllm