Sebastian Farquhar
@sebfar.bsky.social
2.3K followers 58 following 22 posts
Senior Research Scientist at Google DeepMind. AGI Alignment researcher. Views my dog's.
Posts Media Videos Starter Packs
sebfar.bsky.social
By default, LLM agents with long action sequences use early steps to undermine your evaluation of later steps; a big alignment risk.

Our new paper mitigates this, keeps the ability for long-term planning, and doesnt assume you can detect the undermining strategy. 👇
davidlindner.bsky.social
New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward?

Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them!

Inspired by myopic optimization but better performance – details in🧵
sebfar.bsky.social
Reducing unnecessary action *does* drive growth. We are all more productive when we achieve the same things with fewer inputs, wasting citizens' time makes the whole country less productive. Create slack in people's lives and watch what they create with it!
sebfar.bsky.social
Interesting analogy, because of course the Dreadnoughts were mostly militarily useless and were obsoleted by changing strategic considerations before they were ever deployed.
sebfar.bsky.social
I desperately want to know what experience made you try out this prompt. Who hurt you?
sebfar.bsky.social
Interesting. I guess I'm surprised that oil prices would have such a big effect on total fossil fuel CO2 emissions (presumably mostly coal over the period?). But maybe substitutability links them enough.
sebfar.bsky.social
Actually just zoomed in on the data viewer. It does look like 1973 is the break point. Still curious about why the effect was so persistent.
sebfar.bsky.social
Why did land use emissions shrink lots between 196-70 and then stop shrinking?

Why did the oil price shock lead to sustained flat per capita fossil fuel emissions? It was short. Also it started after the trend breaks.
sebfar.bsky.social
I'm surprised that the per capita global emissions look like they are trending pretty flat from 1950ish, much earlier than I would have guessed. Presumably many people greatly increased their energy consumption after then? Do you know what is driving this?
sebfar.bsky.social
Updated! Keep em coming.
sebfar.bsky.social
Help me grow this starter pack for technical researchers working on AGI safety! go.bsky.app/D6P44sC Some flex, but aiming for mostly technical research rather than governance/strategy. Who am I missing?
sebfar.bsky.social
@maosbot.bsky.social what do you think, do you belong on this list? I think most of your research isn't quite in this area but not sure how you self-identify on research focus at the moment.
sebfar.bsky.social
Weak signal perhaps, but you are one of two accounts on Twitter that I genuinely miss here. If you did make the leap that would be lovely :D
sebfar.bsky.social
Help me grow this starter pack for technical researchers working on AGI safety! go.bsky.app/D6P44sC Some flex, but aiming for mostly technical research rather than governance/strategy. Who am I missing?
sebfar.bsky.social
Agreed. I basically don't believe the result at all. Seems like the memetic strength is it lets you feel well informed.
sebfar.bsky.social
You too! Just DMed you :D
sebfar.bsky.social
Strongly agree. On a cold winter day they are basically a pure comfort upgrade. Also great for hayfever.
sebfar.bsky.social
The fact that every field that has tried to have a reproducibility crisis has been able to suggests that the way journals have done it for decades underinvests in finding critical flaws in papers and that retractions are too rare and late to depend on.
sebfar.bsky.social
I've seen at least a couple cases where a very high effort public review identified a significant flaw that the reviewers had missed. Losing that would be a real cost.
sebfar.bsky.social
Entertaining essay about how the decline in practical engineering education has been devastating for *checks notes* professional criminal safe crackers. (Ok, mostly just a fun history of safe cracking.) www.timhunkin.com/94_illegal_e...
timhunkin/illegal engineering
www.timhunkin.com
sebfar.bsky.social
And for readers! Twitter has been getting gradually more boring. Turns out this whole hyperlink thing is a big deal for the internet.
sebfar.bsky.social
Something I loved most about the internet in the 2000s was the idiosyncratic personal webpages that some people had put a crazy amount of time and effort into.

These pages must still exist right? What are the best ones you know of?