Geoffrey Irving
banner
girving.bsky.social
Geoffrey Irving
@girving.bsky.social
Chief Scientist at the UK AI Security Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc.
Pinned
Do you want to fund AI alignment research?

The AISI Alignment Team and I have reviewed >800 Alignment Project Applications from 42 countries, and we have ~100 that are very promising. Unfortunately, this means we have a £13-17M funding gap! Thread with details! 🧵
I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists!

Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.

alignmentproject.aisi.gov.uk
The Alignment Project by AISI — The AI Security Institute
The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.
alignmentproject.aisi.gov.uk
A perk of being an American living in London who is from Alaska is that frequently when talking about temperatures I can refer to just "40 below" with no qualifiers.
November 28, 2025 at 10:32 AM
Do you want to fund AI alignment research?

The AISI Alignment Team and I have reviewed >800 Alignment Project Applications from 42 countries, and we have ~100 that are very promising. Unfortunately, this means we have a £13-17M funding gap! Thread with details! 🧵
I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists!

Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.

alignmentproject.aisi.gov.uk
The Alignment Project by AISI — The AI Security Institute
The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.
alignmentproject.aisi.gov.uk
November 27, 2025 at 6:25 PM
The UK AI Security Institute ran an Alignment Conference from 29-31 November in London! The goal was to gather a mix of people experienced in and new to alignment, and get into the details of novel approaches to alignment and related problems. Hopefully we helped create some new research bets! 🧵
November 13, 2025 at 5:00 PM
Reposted by Geoffrey Irving
🚨New paper🚨

From a technical perspective, safeguarding open-weight model safety is AI safety in hard mode. But there's still a lot of progress to be made. Our new paper covers 16 open problems.

🧵🧵🧵
November 12, 2025 at 2:04 PM
There is a real chance that my most important positive contribution to the world will have been to say something wrong on the internet.
November 10, 2025 at 10:24 AM
The UK AISI Cyber Autonomous Systems Team is hiring propensity researchers to grow the science around whether models *are likely* to attempt dangerous behaviour, as opposed to whether they are capable of doing so. 🧵

job-boards.eu.greenhouse.io/aisi/jobs/47...
Research Scientist - CAST Propensity
London, UK
job-boards.eu.greenhouse.io
November 7, 2025 at 9:14 AM
Spooky:

import Batteries.Data.UInt

def danger : UInt64 := UInt64.ofNat UInt64.size - 1
theorem danger_eq_large : danger = 18446744073709551615 := by decide +kernel
theorem danger_eq_one : danger = 1 := by native_decide
theorem bad : False := by simpa using danger_eq_large.symm.trans danger_eq_one
October 31, 2025 at 10:04 PM
Reposted by Geoffrey Irving
the time it would have taken me would probably have been of order of magnitude an hour (an estimate that comes with quite wide error bars). So it looks as though we have entered the brief but enjoyable era where our research is greatly sped up by AI but AI still needs us. 3/3
October 31, 2025 at 7:25 PM
Reposted by Geoffrey Irving
I published a new post on my rarely updated personal blog! It's a sequel of sorts to my Quanta coverage of the Busy Beaver game, focusing on a particularly fearsome Turing machine known by the awesome name Antihydra.
Why Busy Beaver Hunters Fear the Antihydra
In which I explore the biggest barrier in the busy beaver game. What is Antihydra, what is the Collatz conjecture, how are they connected, and what makes them so daunting?
benbrubaker.com
October 27, 2025 at 4:04 PM
Another strong transition from @matt-levine.bsky.social.
October 23, 2025 at 7:59 PM
New AISI report mapping cruxes for whether AI progress might be fast or slow towards systems near or beyond human-level at most cognitive tasks. The goal is not to resolve uncertainties but reflect them: we don't know how AI will go, and should plan accordingly!

www.aisi.gov.uk/research/und...
Understanding AI Trajectories: Mapping the Limitations of Current AI Systems
www.aisi.gov.uk
October 23, 2025 at 3:17 PM
New open source library from the UK AI Security Institute! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! 🧵
October 22, 2025 at 6:04 PM
There's a nice recent post by @tobyord.bsky.social on the efficiency of pretraining vs. RL, arguing that RL can learn at most 1 bit per episode given binary reward. It's right that RL is less efficient, but 1 bit is not actually a limit in practice. 🧵 on why:

www.tobyord.com/writing/inef...
The Extreme Inefficiency of RL for Frontier Models — Toby Ord
The new scaling paradigm for AI reduces the amount of information a model could learn per hour of training by a factor of 1,000 to 1,000,000. I explore what this means and its implications for scaling...
www.tobyord.com
October 16, 2025 at 8:53 AM
Is there Matt Levine but for pure mathematics?
October 1, 2025 at 5:30 PM
Ominous start to a Wikipedia page about a formula...

en.wikipedia.org/wiki/Fa%C3%A...
September 29, 2025 at 9:02 PM
Reposted by Geoffrey Irving
Amongst the projects funded is my project www.renaissancephilanthropy.org/a-dataset-of... to create what in 2025 is a super-hard dataset of pairs (informal hard proof, formal statement) of recent results from top journals. The challenge for machine is to formalise the rest of the paper.
www.renaissancephilanthropy.org
September 18, 2025 at 8:25 AM
Cas is very good and you should hire him as faculty!
📌📌📌
I'm excited to be on the faculty job market this fall. I just updated my website with my CV.
stephencasper.com
Stephen Casper
Visit the post for more.
stephencasper.com
September 4, 2025 at 12:38 PM
From near the end of Sleepwalkers, by Christopher Clark, as World War I starts.
August 23, 2025 at 3:40 PM
Reposted by Geoffrey Irving
I'm honored to serve as Expert Advisor for "The Alignment Project", an international initiative dedicated to ensuring AI systems are safe and beneficial. They are providing significant funding, compute, and collaboration opportunities for researchers---including those in cogsci/neuro. Please apply!
I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists!

Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.

alignmentproject.aisi.gov.uk
The Alignment Project by AISI — The AI Security Institute
The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.
alignmentproject.aisi.gov.uk
August 20, 2025 at 5:54 PM
The correct mathematical definition is that one that makes the most intermediate lemmas happen to be true, along the way to the result you care about.
August 20, 2025 at 5:53 PM
Reposted by Geoffrey Irving
🧵 New paper from UK AISI x @eleutherai.bsky.social rai.bsky.social‬ that I led with @kyletokens.bsky.social y.social���:

Open-weight LLM safety is both important & neglected. But filtering dual-use knowledge from pre-training data improves tamper resistance *>10x* over post-training baselines.
August 12, 2025 at 11:45 AM
In Lie group theory, the Killing form tells you which elements correspond to functionals that annihilate a given subspace. It is named after Wilhelm Killing.

en.wikipedia.org/wiki/Killing...
Killing form - Wikipedia
en.wikipedia.org
August 10, 2025 at 5:55 PM
Automatic routing to models of varying levels of capability may be bad for the quality of AI discourse, even if it is a great idea in other respects.
August 9, 2025 at 7:00 PM
My hobby is finding compiler bugs in Lean, apparently.
August 3, 2025 at 9:36 PM
Reposted by Geoffrey Irving
This newsletter has been an embarrassment internally, but its failure isn’t a big concern. www.bloomberg.com/opinion/news...
You Can Insider Trade NFTs Now
Not legal advice. Also Builder.ai, Boring Co., Harvard and AI.
www.bloomberg.com
July 31, 2025 at 6:00 PM