M

(Senior) RE/RS

METR
On-site
Berkeley

About METR


We are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&D automation, and alignment. Our work advances the science of AI measurement by understanding frontier AI systems' ability to complete complex tasks without human input, and directly executing those measurements to inform risk assessments and consensus within the AI industry, among policymakers, and the public. Our work has been cited by NIST, a previous US President, the UK Government, Nature, The New York Times, and Time Magazine. Through our work with leading AI labs, governments, and academia, we ensure that our insights can quickly be leveraged to promote the safe development of increasingly powerful AI systems. We believe it is robustly good for civilization to have a clear understanding of what types of danger AI systems pose and how high the risk is, and we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time.


What We're Looking For

We're seeking a researcher to help us better understand AI capabilities. Previous work in this vein includes agent time horizons, a commonly-used metric for measuring AI progress, and RCTs on open-source developer productivity.


We're excited about candidates from a wide-range of backgrounds. If you're scrappy, smart, and driven to better understand model capabilities, please apply - we're excited to chat with researchers, engineers, and startup-founders alike.

\n


Requirements for this role:
  • You can write code. At the very least, you should be able to quickly write a write a data analysis script in Python to answer an important question. Bonus points if you can write a clean PR too.
  • You're excited to get your hands dirty. METR researchers often interact with LLMs in a wide variety of scenarios, read lots of agent transcripts, and closely review human outputs (e.g. video recordings of developers in our productivity RCT).
  • You are undaunted by open-ended mandates. You can take a confusing or ill-posed question and produce insightful and helpful frameworks/proposals/results.
  • You should be able to read, understand, and critique a research proposal. You're able to understand how particular projects fit into METR's overall mission.
  • You're a good written communicator. Bonus points if you can write a great paper.
  • You work fast and are highly reliable.


Possible projects you may take on in your first three months:
  • Lead a project investigating transcripts as a source of evidence about agent capabilities. How do agents perform on different types of tasks with real users? Create metrics that speak to the degree of uplift AI agents provide, and collect these metrics from AI R&D-relevant companies.
  • Improve METR's time-horizon metric ("Moore's law for AI agents") to make it more externally valid, more interpretable, and more predictive on threat-model relevant capabilities. Improve this metric to be the single most useful source of evidence for interpreting the rate of AI progress.
  • Design and build experiments testing agent capabilities in the wild. Create a new source of evidence for us to better triangulate agent capabilities and limitations.
  • Lead large-scale human-subjects experiments measuring the impacts of AI agents on economically-valuable R&D.


\n
$257,795 - $450,885 a year
\n

Our Culture


METR is a mission-driven organization. We believe our work can meaningfully shape humanity's future for the better, and we want to be the best people in the world doing this work. We have a tight-knit, collaborative research culture rooted in truth-seeking and integrity. We're fiercely committed to producing high-quality, trustworthy science. We're honest and transparent about our results, especially when they may go against the grain. We've earned trust as reliable partners who handle confidential information with care. We maintain a low-ego, drama-free environment focused on what matters.


Hybrid Requirements: Our technical team members are in our office in Berkeley 3-5 days/week. Please let us know in your application if this is a constraint. If you lack US work authorization and would like to work in-person (strongly preferred), we can likely sponsor a cap-exempt H-1B visa for this role.


We encourage you to apply even if your background may not seem like the perfect fit! We would rather review a larger pool of applications than risk missing out on a promising candidate for the position.


We are committed to diversity and equal opportunity in all aspects of our hiring process. We do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. We welcome and encourage all qualified candidates to apply for our open positions.

Apply now
Share this job