hybrid

Jailbreaking Lead (Red Team)

FAR.AI is seeking a Jailbreaking Lead (Red Team) to spearhead efforts in finding universal jailbreaks in leading frontier AI models. This senior IC role involves hands-on attack development, breaking models, and setting technical standards for world-class jailbreaking, with an option to lead and build a team over time.

About the role

About the Role

FAR.AI is seeking a Jailbreaking Lead (Red Team) whose personal mission and obsession is to jailbreak the world's leading frontier AI models. You will sit at the tip of the spear of one of the world's leading AI red-teams, with a single, critical focus: find the universal jailbreaks that no one else can find, in the models used by hundreds of millions of people, and make sure they get fixed.

This is primarily a senior IC role with some management responsibilities, ideally for candidates who want to build and lead a jailbreaking team over time. An IC-only track is also available. Either way, you will spend the majority of your time hands-on, building attacks and breaking frontier models, and setting the technical bar for what a world-class jailbreak looks like.

About FAR.AI

FAR.AI is a non-profit AI research institute dedicated to ensuring advanced AI is safe and beneficial for everyone. Our mission is to facilitate breakthrough AI safety research, advance global understanding of AI risks and solutions, and foster a coordinated global response.

Since our founding in July 2022, we've grown quickly to 45+ staff, producing over 40 influential academic papers, and establishing leading AI Safety events. Our work is recognized globally, with publications at premier venues such as NeurIPS, ICML, and ICLR, and features in the Financial Times, Nature News and MIT Technology Review. Additionally, we help steer and grow the AI safety field through developing research roadmaps with renowned researchers such as Yoshua Bengio; running FAR.Labs, an AI safety-focused co-working space in Berkeley housing 40 members; and supporting the community through targeted grants to technical researchers.

About Red-Teaming at FAR.AI

FAR.AI’s red team is building toward a simple outcome: materially raising the bar for safety and security of the most widely deployed and capable AI systems in the world. We intend to be the tip of the spear in AI safety: the team that consistently finds the failures others miss, resulting in real mitigations, and setting the standard that labs and governments converge on. We also leverage our in-depth understanding of weaknesses in frontier models to advise frontier developers on mitigations, to guide our own research and grant-making for improving model security, and to inform the public of key AI risks.

We are already one of the leading independent red-teaming organizations. Our work has helped most Western frontier model developers improve safeguards through pre- and post-deployment testing (e.g., we have directly influenced safeguards at major frontier developers like OpenAI and Anthropic), and we are increasingly embedded in high-leverage government efforts (e.g., leading a consortium building CBRN evaluations for the European Commission/EU AI Office, and collaborating with the UK AI Security Institute).

You will be the senior technical owner of our jailbreaking practice reporting to Kellin Pelrine with a dotted line to Edward Yee. In 2026, we are scaling from a strong team with standout wins into a new level of impact for any AI red team globally:

Red-teaming all major frontier model releases (closed and open-weight) within days/weeks of release;
Expanding strategic engagements with governments and conducting pre-deployment testing with most frontier labs;
Deepening our testing of key risk areas like CBRN, cyber, and agents, and exploring new ones like AI control and alignment;
Building tools, agents, and insights that raise the global standard for red-teaming.

About the Role

Jailbreaking is the core technical engine of the red team. As Jailbreaking Lead, you own that engine. You are the person who personally breaks the hardest targets, sets the bar the rest of the team pushes toward, and makes sure we keep discovering the highest severity, universal vulnerabilities – the most important vulnerabilities to fix – in the most heavily defended frontier models on the planet, faster than anyone else.

We expect you to spend at least 50-70% of your time hands-on across 2026: breaking models, chaining novel attack classes through defense-in-depth stacks, helping to invent new techniques when existing ones fail, and setting the standard for what constitutes a significant vulnerability and a credible mitigation. The remaining time will go to managing/mentoring ICs, helping to shape the jailbreaking research agenda with Kellin, and making sure our findings land with frontier labs, governments, and the broader field. The rest of the red team will empower your work, whether through direct collaboration and support, novel research and red-teaming infrastructure, or toolkits and agent build-outs.

This is a senior IC role by default, intended to attract a world-class jailbreaker whose personal mission is to find critical jailbreaks in the most heavily defended domains of the leading frontier AI models, and who has a track record of repeatedly doing so. We are open to a management track for candidates who want to hire and lead a jailbreaking team over time. We will not water down the IC bar to support the management track: both versions of this role require you to be, or be on a clear trajectory to being, one of the best jailbreakers in the world.

In practice, this role spans:

Lead jailbreaking on the highest-stakes engagements:
- Personally develop universal and near-universal jailbreaks against frontier closed- and open-weight models, in CBRNE, cyber, agentic security, extreme persuasion, and emerging risk domains;
- Systematically dismantle defense-in-depth stacks (input filters, model-level refusal and safe completion, reasoning monitors, output filters, account-level moderation), chaining novel and established techniques;
- Escalate initial vulnerabilities to expose their most severe form, maximizing universality, success rate, and capability of elicited output;
- Own the technical bar for vulnerability severity and generality on every major engagement.
Push the frontier of jailbreaking techniques:
- Invent new attack classes when existing techniques fail (e.g., we have recently shipped novel attacks against Constitutional Classifiers and fine-tun

About the role

About the Role

About FAR.AI

About Red-Teaming at FAR.AI

Red-teaming all major frontier model releases (closed and open-weight) within days/weeks of release;
Expanding strategic engagements with governments and conducting pre-deployment testing with most frontier labs;
Deepening our testing of key risk areas like CBRN, cyber, and agents, and exploring new ones like AI control and alignment;
Building tools, agents, and insights that raise the global standard for red-teaming.

About the Role

In practice, this role spans:

Lead jailbreaking on the highest-stakes engagements:
- Personally develop universal and near-universal jailbreaks against frontier closed- and open-weight models, in CBRNE, cyber, agentic security, extreme persuasion, and emerging risk domains;
- Systematically dismantle defense-in-depth stacks (input filters, model-level refusal and safe completion, reasoning monitors, output filters, account-level moderation), chaining novel and established techniques;
- Escalate initial vulnerabilities to expose their most severe form, maximizing universality, success rate, and capability of elicited output;
- Own the technical bar for vulnerability severity and generality on every major engagement.
Push the frontier of jailbreaking techniques:
- Invent new attack classes when existing techniques fail (e.g., we have recently shipped novel attacks against Constitutional Classifiers and fine-tun