Lightnews — Scholar-powered news

Ólafur Páll Geirsson @geirsson.com · Jun 21

Current status

Ólafur Páll Geirsson @geirsson.com · Jun 7

The best mental model for buying strollers is to think of renting them by the month. The resale value holds pretty high so the final cost isn’t so bad even for the $2k premium twin strollers.

Ólafur Páll Geirsson @geirsson.com · Jun 3

The usual answer is “no” whenever people ask whether LSP can be used in a novel way.

The protocol’s strength is also its weakness, it’s very much optimized around a single human user interacting with an IDE.

Ólafur Páll Geirsson @geirsson.com · May 31

Meanwhile we’ll get ads in ChatGPT.

Ólafur Páll Geirsson @geirsson.com · May 31

Anthropic is the king of function calling and deserves their incredible revenue growth. They’ve paved the way for AI agents, not OpenAI or Google. It’s only a matter of time before Google gets the memo and Gemini starts taking function calling more seriously.

1 1

Ólafur Páll Geirsson @geirsson.com · May 30

Three kids done with chickenpox this month.

2

Ólafur Páll Geirsson @geirsson.com · May 29

On a second iteration, it seems like it's the web tool that's causing troubles. Disabling the web tool makes Claude 4 reach the right syntactic solution although not with the optimal token edits. Goes to show that you need to be careful with what tools you're exposing. Less is more.

1

Ólafur Páll Geirsson @geirsson.com · May 29

Sonnet 3.7 is the only model I've seen that delivers the perfect solution, it replaces the tokens for `.` and `apply` and nothing else. All other models I've tested use the worse tree replacement APIs.

1

Ólafur Páll Geirsson @geirsson.com · May 29

Surprisingly, Sonnet and Opus 4 both fail on one of my go-to codegen tests for new models

> Implement a Scalafix rule that converts foo.apply(...) to foo(...) and explain why it's semantic or syntactic

They both think it needs to be semantic (aka. have access to types and symbol).

1 1

Ólafur Páll Geirsson @geirsson.com · May 27

Amp Tab is coming along nicely, it's not too far from being able to replace Cursor Tab as my daily driver.

Ólafur Páll Geirsson @geirsson.com · May 27

Last note, even with code that can be unit tested, I still think most of the tests that AI generates is crap. And the AI generated commit messages also miss the point. I'm seeing lots of PRs now where people add AI generated tests that aren't even testing anything meaningful.

2

Ólafur Páll Geirsson @geirsson.com · May 27

The Dwarkesh episode still gave a fresh perspective on how these models work, and I have probably underestimated how powerful they will become. If you're still judging AI capabilities by today's products and today's models then you are probably also underestimating how weird things are going to get.

1

Ólafur Páll Geirsson @geirsson.com · May 27

I am knee deep in the AI hype, and I don't think software engineering will ever be the same again. I love working on ampcode.com and I see daily anecdotes how AI coding is turning software development upside-down for our users.

Amp

Everything will change.

ampcode.com

1 1

Ólafur Páll Geirsson @geirsson.com · May 27

Even components that can be unit tested or e2e tested via behavioral assertions have lots of implicit constraints wrt. latency or how features interact with each other in long-running user sessions that are impractical to tests in an automated fashion.

1

Ólafur Páll Geirsson @geirsson.com · May 27

The fallacy is thinking that all software engineers do is deliver code that can be tested in isolation, and AI is very good at doing that now. The problem is that tests only cover maybe 0-50% of real-world constraints.

1

Ólafur Páll Geirsson @geirsson.com · May 27

I keep shaking my head hearing AI folks claiming software engineering will be automated this year. After listening to this conversation, I better understand what they at least mean by this. These AI researchers are super smart, but they're also sort of clueless over what "software engineering" is.

1

Ólafur Páll Geirsson @geirsson.com · May 27

The Dwarkesh episode on Claude 4 is the most in-depth, balanced, and (almost) non-hype conversation I have heard on why AI researchers believe AGI is around the corner open.spotify.com/episode/3H46...

How Does Claude 4 Think? — Sholto Douglas & Trenton Bricken

Dwarkesh Podcast · Episode

open.spotify.com

1 4

Ólafur Páll Geirsson @geirsson.com · May 25

Memory reminds me of the Facebook feed circa 2016. It was clearly beneficial for the company, it sure boosted engagement, but deleted my Facebook account and was better off for it.

Ólafur Páll Geirsson @geirsson.com · May 25

Memory in AI chatbots is overrated, it turns the LLM into a sycophant by tying every response with random pieces of information that got extracted from past conversations.

I’m sure memory is great for engagements/likability, but it’s turned me off ChatGPT personally.

1

Ólafur Páll Geirsson @geirsson.com · May 23

After starting working on Amp (ampcode.com ):

- No meetings
- No code review, just push to main
- Take responsibility for your changes
- Rarely need to create a branch off main
- Auto-release every few hours
- Prioritize user bug reports whenever possible

Amp

Everything will change.

ampcode.com

2

Ólafur Páll Geirsson @geirsson.com · May 22

At the risk of being pedantic, when many people say “one shot” the actually mean zero shot pass@1.

Technically, one shot means including one example output in the prompt, and most prompts don’t do that.

Not blaming, I even catch myself saying one shot meaning pass@1.

Ólafur Páll Geirsson @geirsson.com · May 22

Contrary to popular belief, the models are surprisingly bad at writing CSS.

Ólafur Páll Geirsson @geirsson.com · May 22

There’s a different trajectory for the people who are excited about AI because it enables them to build more expertise, or skip building expertise.

Concrete example, I love AI because it helps me learn CSS faster, not because AI writes all my CSS so I don’t have to learn it.

1 2

Ólafur Páll Geirsson @geirsson.com · May 21

git worktrees are overrated, they're a performance optimization that only makes sense when working in a repo that's super slow to clone.

For normal repos, just clone twice and enjoy benefits like being able to check out the main branch in both clones at the same time.

3

Ólafur Páll Geirsson @geirsson.com · May 19

Sprinkling Copilot dependencies across the VS Code codebase is a great technique to make it more annoying to keep a fork up-to-date. Well played, Microsoft.