hybrid

Research Engineer/Research Scientist – Model Transparency

The AI Security Institute is seeking Research Scientists and Research Engineers for its Model Transparency team to understand and address risks as AI models become less transparent. This role involves researching how oversight is declining and developing methods to detect, measure, and mitigate potential issues to ensure reliable evaluations of frontier AI systems.

About the role

About the AI Security Institute

The AI Security Institute is the world's largest and best-funded team dedicated to understanding advanced AI risks and translating that knowledge into action. We’re in the heart of the UK government with direct lines to No. 10 (the Prime Minister's office), and we work with frontier developers and governments globally.

We’re here because governments are critical for advanced AI going well, and UK AISI is uniquely positioned to mobilise them. With our resources, unique agility and international influence, this is the best place to shape both AI development and government action.

Team Description

The ability to effectively evaluate and monitor AI systems will grow in importance as models become more capable, autonomous, and integrated into society. If models can detect and game evaluations, obscure their reasoning, or behave differently under observation, the safety claims that governments and developers rely on become unreliable. Understanding and addressing these risks is essential to ensuring that oversight of advanced AI systems keeps pace with their capabilities.

The Model Transparency team is a research team within AISI focused on ensuring that evaluations, assessments, and monitoring of frontier AI systems remain reliable as models become less transparent. We research how and why oversight is declining – through phenomena such as evaluation awareness, unfaithful chain-of-thought reasoning, and changes in model architectures – and develop methods (including white and black box methods) to detect, measure, and mitigate potential issues. We share our findings with frontier AI companies (including Anthropic, OpenAI, DeepMind), UK government officials, and allied governments, and publicly to inform their deployment, research, and policy decisions. We also work directly with safety teams at frontier labs, contributing to safety case reviews and helping improve their alignment evaluation methodology.

Our recent work includes auditing games for sandbagging, reproducing natural emergent misalignment from reward hacking, and identifying open-weight language models that game propensity evaluations.

Role Description

We're looking for Research Scientists and Research Engineers for the Model Transparency team with expertise in technical AI safety – such as interpretability, capability or alignment evaluations, model transparency – or with broader experience with frontier LLM research and development. An ideal candidate would have a strong track record of high-quality research in technical AI safety or adjacent fields.

Research Scientists, drive the technical substance of our work – staying abreast of the literature, proposing and designing experiments, conducting rigorous analyses, and owning the evidence stack from experiment through to written output. They write, critique, and strengthen the team's reports and publications.
Research Engineers, build the systems and tooling that make our research possible and fast – scaling experimental workflows, automating processes, solving infrastructure challenges, and creating systems that accelerate the entire team's output.

We're interested in candidates along the spectrum between Research Engineers and Research Scientists. The application form will ask you to indicate which role you lean towards.

The team is led by Joseph Bloom, advised by Geoffrey Irving. You'll work with talented, mission-driven technical staff across AISI, including alumni from Anthropic, OpenAI, DeepMind, and top universities. You may also collaborate with external research teams including those at frontier AI labs, METR, and FAR.

We are open to hires across a range of experience levels.

This role requires three days a week in person, with flexibility for occasional periods of remote working.

Representative Projects You Might Work On

Developing a chain-of-thought monitorability benchmark and comparing monitorability properties across frontier AI systems, leveraging AISI’s unique access to reasoning traces from multiple labs.
Designing and running experiments on open-weight models to study alignment and oversight-relevant phenomena – such as reproducing emergent misalignment from reward hacking, or red-teaming techniques like inoculation prompting and character training.
Using white-box and interpretability methods – such as activation oracles, sparse auto-encoders or probes – to detect misalignment that isn’t visible through behavioural evaluation alone.
Building tooling and infrastructure for our research – including agent orchestration, large-scale RL pipelines, mechanistic interpretability methodologies, and auditing agents.

The work could also involve:

Reviewing frontier lab risk assessments and safety cases, providing independent analysis of alignment claims before deployment decisions.
Conducting literature reviews and expert interviews to map the state of model transparency risks and inform AISI’s strategic priorities.
Translating technical findings into actionable insights for AISI evaluation teams, UK government officials, and international partners.

What we’re looking for

Requirements for both roles:

A get-things-done mindset – you take ownership, move fast, and care about shipping work that matters.
A combination of self-sufficiency and enthusiasm for teamwork – you’re equally happy defining your own agenda and contributing to shared goals. You’re excited about growing, giving and receiving feedback, and building something together.
An ability to build, supervise and orchestrate AI agents to complete tasks effectively, while verifying and maintaining quality of work.
A demonstrated track record of relevant, high-quality work – whether technical publications, blog posts, or other publicly visible contributions.

Research Scientists – our requirements are:

Hands-on research experience with large language models (LLMs) – such as evaluating or fine-tuning models, developing and testing monitors, or auditing models with white-box or black-box techniques.
Ability and experience in writing research code for machine learning experiments, including experience with ML frameworks like PyTorch or evaluation frameworks like Inspect.
An ability to write high-quality, concise research proposals that are well-motivated, tractable, and coherent.
Good research taste – an ability to identify what’s important, choose productive directions, and avoid getting lost in dead ends.
An ability to read research critically, identify flawed arguments, and poke holes in safety claims.

We don’t expect RS candidates to meet all of the following, but they are useful signal:

Experience designing and running alignment evaluations or working on model transparency research.
Experience with interpretability or white-box methods – such as mechanistic interpretability, sparse autoencoders, probing, or activation analysis.
Familiarity with alignment literature, current methods for post-training and aligning L

Research Engineer/Research Scientist – Model Transparency

About the role

About the AI Security Institute

Team Description

Role Description

Representative Projects You Might Work On

What we’re looking for

Requirements for both roles:

Research Scientists – our requirements are:

Research Engineer/Research Scientist – Model Transparency

About the role

About the AI Security Institute

Team Description

Role Description

Representative Projects You Might Work On

What we’re looking for

Requirements for both roles:

Research Scientists – our requirements are:

Skills