onsite
Member of Technical Staff - Post-Training and RL
Member of Technical Staff - Post-Training and RL
As a Member of Technical Staff, you will tackle critical post-training and reinforcement learning challenges, focusing on reward modeling, preference optimization (RLHF/DPO), and improving reasoning and truthfulness in AI models. This role is for individuals passionate about building incredibly useful AI models and pushing the boundaries of reinforcement learning and alignment methods.
About the role
ABOUT THE ROLE:
- You will work on the most critical post-training and reinforcement learning challenges at any given time — including reward modeling, preference optimization (RLHF/DPO), and RL for improving reasoning, truthfulness, and real-world capabilities.
- You will get clarity on your first project before an offer.
BASIC QUALIFICATIONS:
- You believe truth-seeking AI is the most important and challenging problem.
- You are obsessed about building incredibly useful models through post-training and RL techniques.
- You are a power user of AI models and eager to push the boundaries of what’s possible with reinforcement learning and alignment methods.
- If you previously worked on post-training, RLHF, or trained models used by millions of people it’s a big plus, but relevant experience is not required.
- You take pride in your work and thrive in meritocratic environments.