hybrid

Research Scientist, Interpretability

Anthropic is seeking a Research Scientist for their Interpretability team to reverse-engineer how modern language models work. This role involves developing methods to understand LLMs, designing experiments, building infrastructure, and collaborating to make AI systems safe and interpretable.

About the role

About the Role

The Interpretability team at Anthropic is dedicated to reverse-engineering how trained models work, believing that a mechanistic understanding is crucial for making advanced AI systems safe. This role focuses on mechanistic interpretability, aiming to discover how neural network parameters map to meaningful algorithms. You will contribute to a solid foundation for understanding neural networks and ensuring their safety, collaborating with teams like Alignment Science and Societal Impacts.

Responsibilities

Develop methods for understanding Large Language Models (LLMs) by reverse-engineering algorithms learned in their weights.
Design and run robust experiments, both in toy scenarios and at scale in large models.
Create and analyze new interpretability features and circuits to better understand model functionality.
Build infrastructure for running experiments and visualizing results.
Work with colleagues to communicate results internally and publicly.

You may be a good fit if you:

Have a strong track record of scientific research (in any field) and some prior work on Interpretability.
Enjoy team science and collaborative discovery.
Are comfortable with messy experimental science and inventing new methodologies.
View research and engineering as integrated, writing code, designing experiments, and interpreting results.
Can clearly articulate motivations for your work and effectively communicate learned insights, including null results.

Required Skills

Familiarity with Python.

Logistics

Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience.
Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience.
Location-based hybrid policy: Currently, all staff are expected to be in one of our offices at least 25% of the time. This role is based in San Francisco, CA, but remote work may be considered for exceptional candidates on a case-by-case basis.
Visa sponsorship: Visa sponsorship is available.