Ólafur Páll Geirsson
@geirsson.com
610 followers 720 following 280 posts
Building Agents at Sourcegraph. Posts about coding, AI, and family (3 kids). @olafurpg elsewhere. Based in Oslo, Norway. https://geirsson.com
Posts Media Videos Starter Packs
The best mental model for buying strollers is to think of renting them by the month. The resale value holds pretty high so the final cost isn’t so bad even for the $2k premium twin strollers.
The usual answer is “no” whenever people ask whether LSP can be used in a novel way.

The protocol’s strength is also its weakness, it’s very much optimized around a single human user interacting with an IDE.
Meanwhile we’ll get ads in ChatGPT.
Anthropic is the king of function calling and deserves their incredible revenue growth. They’ve paved the way for AI agents, not OpenAI or Google. It’s only a matter of time before Google gets the memo and Gemini starts taking function calling more seriously.
Three kids done with chickenpox this month.
On a second iteration, it seems like it's the web tool that's causing troubles. Disabling the web tool makes Claude 4 reach the right syntactic solution although not with the optimal token edits. Goes to show that you need to be careful with what tools you're exposing. Less is more.
Sonnet 3.7 is the only model I've seen that delivers the perfect solution, it replaces the tokens for `.` and `apply` and nothing else. All other models I've tested use the worse tree replacement APIs.
Surprisingly, Sonnet and Opus 4 both fail on one of my go-to codegen tests for new models

> Implement a Scalafix rule that converts foo.apply(...) to foo(...) and explain why it's semantic or syntactic

They both think it needs to be semantic (aka. have access to types and symbol).
Amp Tab is coming along nicely, it's not too far from being able to replace Cursor Tab as my daily driver.
Last note, even with code that can be unit tested, I still think most of the tests that AI generates is crap. And the AI generated commit messages also miss the point. I'm seeing lots of PRs now where people add AI generated tests that aren't even testing anything meaningful.
The Dwarkesh episode still gave a fresh perspective on how these models work, and I have probably underestimated how powerful they will become. If you're still judging AI capabilities by today's products and today's models then you are probably also underestimating how weird things are going to get.
I am knee deep in the AI hype, and I don't think software engineering will ever be the same again. I love working on ampcode.com and I see daily anecdotes how AI coding is turning software development upside-down for our users.
Amp
Everything will change.
ampcode.com
Even components that can be unit tested or e2e tested via behavioral assertions have lots of implicit constraints wrt. latency or how features interact with each other in long-running user sessions that are impractical to tests in an automated fashion.
The fallacy is thinking that all software engineers do is deliver code that can be tested in isolation, and AI is very good at doing that now. The problem is that tests only cover maybe 0-50% of real-world constraints.
I keep shaking my head hearing AI folks claiming software engineering will be automated this year. After listening to this conversation, I better understand what they at least mean by this. These AI researchers are super smart, but they're also sort of clueless over what "software engineering" is.
The Dwarkesh episode on Claude 4 is the most in-depth, balanced, and (almost) non-hype conversation I have heard on why AI researchers believe AGI is around the corner open.spotify.com/episode/3H46...
How Does Claude 4 Think? — Sholto Douglas & Trenton Bricken
Dwarkesh Podcast · Episode
open.spotify.com
Memory reminds me of the Facebook feed circa 2016. It was clearly beneficial for the company, it sure boosted engagement, but deleted my Facebook account and was better off for it.
Memory in AI chatbots is overrated, it turns the LLM into a sycophant by tying every response with random pieces of information that got extracted from past conversations.

I’m sure memory is great for engagements/likability, but it’s turned me off ChatGPT personally.
After starting working on Amp (ampcode.com ):

- No meetings
- No code review, just push to main
- Take responsibility for your changes
- Rarely need to create a branch off main
- Auto-release every few hours
- Prioritize user bug reports whenever possible
Amp
Everything will change.
ampcode.com
At the risk of being pedantic, when many people say “one shot” the actually mean zero shot pass@1.

Technically, one shot means including one example output in the prompt, and most prompts don’t do that.

Not blaming, I even catch myself saying one shot meaning pass@1.
Contrary to popular belief, the models are surprisingly bad at writing CSS.
There’s a different trajectory for the people who are excited about AI because it enables them to build more expertise, or skip building expertise.

Concrete example, I love AI because it helps me learn CSS faster, not because AI writes all my CSS so I don’t have to learn it.
git worktrees are overrated, they're a performance optimization that only makes sense when working in a repo that's super slow to clone.

For normal repos, just clone twice and enjoy benefits like being able to check out the main branch in both clones at the same time.
Sprinkling Copilot dependencies across the VS Code codebase is a great technique to make it more annoying to keep a fork up-to-date. Well played, Microsoft.