onsite

LLM Engineer Reinforcement Learning - 42dot

LLM Engineer

Lead the design and optimization of large language model training pipelines, applying advanced RL techniques such as PPO, GRPO, and DPO to deliver high‑quality, self‑improving generative AI services.

About the role

Key Responsibilities

Design and implement efficient LLM training pipelines for production use.
Apply Direct Alignment algorithms (PPO, GRPO, DPO) to improve training efficiency and model alignment.
Enhance generation accuracy and stability through reward shaping and self‑refinement mechanisms.
Develop foundational models that integrate external knowledge and APIs, enabling dynamic tool selection based on user prompts.
Troubleshoot GPU‑accelerated training and scale workloads using distributed training frameworks.

Requirements

3+ years of experience in deep learning or NLP (master’s candidates welcome).
Proficient in Python and PyTorch for model design, training, evaluation, and optimization.
Hands‑on experience with large‑scale LLM training on GPU clusters and distributed systems.
Strong understanding of reinforcement learning methods for language models.
Excellent problem‑solving skills and ability to iterate rapidly on new methodologies.

Skills

pythonpytorchreinforcement learningllm

Company42dot

DepartmentResearch

LocationPangyo, Korea, Republic of

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 21, 2026