Lightnews — Scholar-powered news

Simon Willison

@simonwillison.net

Fun new variant on my pelican riding a bicycle SVG benchmark by BeetleB, who instead used "Write a POV-Ray file that shows a pelican riding on a bicycle" to get ray-traced pelicans: blog.nawaz.org/posts/2025/O...

Claude Opus 4: 3D scene. The bicycle has a sort of square frame in the wrong place, but good wheels. The pelican is stood on top - a large white blob, a smaller white blob head, a cylinder neck and a conical beak in the right place, plus legs that reach out-of-place pedals. A egg floats mysteriously in front of the bird.

GPT-5: The bike is a bit mis-shapen but has most of the right pieces. The pelican has legs that reach the pedals and is bending forward with a two-segmented neck and a good beak. A weird egg floats in the front wheel.

November 9, 2025 at 5:22 PM

Simon Willison

@simonwillison.net

For comparison, here are the pelicans riding bicycles drawn by GPT-5-Codex-Mini (the new model), GPT-5-Codex and full GPT-5 - all produced via the same hacked version of the Codex CLI tool

GPT-5-Codex-Mini. This is terrible. The pelican is an abstract collection of shapes, the bicycle is likewise very messed up

GPT-5 Codex. It's a dumpy little pelican with a weird face, not particularly great but better than Mini.

GPT-5: Much better bicycle, pelican is a bit line-drawing-ish but does have the necessary parts in the right places

November 9, 2025 at 3:47 AM

Simon Willison

@simonwillison.net

OpenAI partially released a new model yesterday called GPT-5-Codex-Mini

No API access yet, but I did some truly horrible things to their Codex CLI app to get it to spit out this SVG of a pelican riding a bicycle

This is pretty bad. The bicycle is just about recognizable - a collection o f abstract lines and two circles - but the pelican is a weird little snow goblin tangled in a bundle of random lines hovering over the rest of the bike

November 9, 2025 at 3:37 AM

Simon Willison

@simonwillison.net

And here's an example of one of my code research prompts

Create a performance benchmark and feature comparison report on PyPI cmarkgfm compared to other popular Python markdown libraries—check all of them out from github and read the source to get an idea for features, then design and run a benchmark including generating some charts, then create a report in a new python-markdown-comparison folder (do not create a _summary.md file or edit anywhere outside of that folder). Make sure the performance chart images are directly displayed in the README.md in the folder.

November 6, 2025 at 4:02 PM

Simon Willison

@simonwillison.net

Here's my research repo - each of the 13 folders is a different research project, and the README is automatically updated by an LLM to include summaries describing each one github.com/simonw/resea...

Screenshot of a README document with a right-side navigation panel; navigation menu shows: Filter headings search box, heading "Research projects carried out by AI tools" followed by project list: sqlite-query-linter (2025-11-04), h3-library-benchmark (2025-11-04), h3o-python (2025-11-03), wazero-python-claude (2025-11-02), datasette-plugin-skill (2025-10-24), blog-tags-scikit-learn (2025-10-24), cmarkgfm-in-pyodide (2025-10-22), python-markdown-comparison (2025-10-22), datasette-plugin-alpha-versions (2025-10-20), deepseck-ocr-nvidia-spark (2025-10-20), sqlite-permissions-poc (2025-10-20), minijinja-vs-jinja2 (2025-10-19), node-pyodide (2025-10-19)

November 6, 2025 at 3:59 PM

Simon Willison

@simonwillison.net

And in case you don't make it as far as the "miscellaneous tips" section, here's a bunch of lessons I learned about working with coding agents that I picked up along the way simonwillison.net/2025/Nov/4/d...

When working on anything relating to plugins it’s vital to have at least a few real plugins that you upgrade in lock-step with the core changes. The tadd and radd shortcuts were invaluable for productively working on those plugins while I made changes to core.
Coding agents make experiments much cheaper. I threw away so much code on the way to the final implementation, which was psychologically easier because the cost to create that code in the first place was so low.
Tests, tests, tests. This project would have been impossible without that existing test suite. The additional tests we built along the way give me confidence that the new system is as robust as I need it to be.
Claude writes good commit messages now! I finally gave in and let it write these—previously I’ve been determined to write them myself. It’s a big time saver to be able to say “write a tasteful commit message for these changes”.
Claude is also great at breaking up changes into smaller commits. It can also productively rewrite history to make it easier to follow, especially useful if you’re still working in a branch.

A really great way to review Claude’s changes is with the GitHub PR interface. You can attach comments to individual lines of code and then later prompt Claude like this: Use gh CLI to fetch comments on URL-to-PR and make the requested changes. This is a very quick way to apply little nitpick changes—rename this function, refactor this repeated code, add types here etc.
The code I write with LLMs is higher quality code. I usually find myself making constant trade-offs while coding: this function would be neater if I extracted this helper, it would be nice to have inline documentation here, this changing this would be good but would break a dozen tests... for each of those I have to determine if the additional time is worth the benefit. Claude can apply changes so much faster than me that these calculations have changed—almost any improvement is worth applying, no matter how trivial, because the time cost is so low.

Internal tools are cheap now. The new debugging interfaces were mostly written by Claude and are significantly nicer to use and look at than the hacky versions I would have knocked out myself, if I had even taken the extra time to build them.
That trick with a Markdown file full of upgrade instructions works astonishingly well—it’s the same basic idea as Claude Skills. I maintain over 100 Datasette plugins now and I expect I’ll be automating all sorts of minor upgrades in the future using this technique.

November 4, 2025 at 11:37 PM

Simon Willison

@simonwillison.net

Just sent out the October edition of my sponsors-only monthly newsletter - you can pay me $10/month to send you less!

Here's the table of contents
simonwillison.net/2025/Nov/1/s...

Coding agents and "vibe engineering"
Claude Code for web
NVIDIA DGX Spark
Claude Skills
OpenAI DevDay and GitHub Universe
Python 3.14
October in Chinese Al model releases
Miscellaneous extras
Tools I'm using at the moment

November 1, 2025 at 10:15 PM

Simon Willison

@simonwillison.net

When I asked it to figure out it how to run cmarkgfm in Pyodide it noticed that the repo already had notes on Pyodide and reviewed them before starting work on the new project gistpreview.github.io?6d778a8f9c4c...

•
Read(/home/user/research/node-pyodide/README.md)
Read (/home/user/research/node-pyodide/server-simple.js)
Read(/home/user/research/node-pyodide/package.json)
Bash (git branch -a | grep -i cmark)
Bash (git 1og --all --oneline --grep="cmark" -i)
Bash (git status)
Read (home/user/research/python-markdown-comparison/READMI
Now let me research cmarkgfm and Pyodide package building
Web Search("cmarkgfm python package structure C dependenc:
web Search("pyodide build custom C extension package 2025'

October 23, 2025 at 9:54 PM

Simon Willison

@simonwillison.net

Prompt -> Result tools.simonwillison.net/terminal-to-...

Build a new tool called terminal-to-html which lets the user copy RTF directly from their terminal and paste it into a paste area, it then produces the HTML version of that in a textarea with a copy button, below is a button that says "Save this to a Gist", and below that is a full preview. It will be very similar to the existing rtf-to-html.html tool but it doesn't show the raw RTF and it has that Save this to a Gist button

That button should do the same trick that openai-audio-output.html does, with the same use of localStorage and the same flow to get users signed in with a token if they are not already

So click the button, it asks the user to sign in if necessary, then it saves that HTML to a Gist in a file called index.html, gets back the Gist ID and shows the user the URL https://gistpreview.github.io/?6d778a8f9c4c2c005a189ff308c3bc47 - but with their gist ID in it

They can see the URL, they can click it (do not use target="_blank") and there is also a "Copy URL" button to copy it to their clipboard

Make the UI mobile friendly but also have it be courier green-text-on-black themed to reflect what it does

If the user pastes and the pasted data is available as HTML but not as RTF skip the RTF step and process the HTML directly

If the user pastes and it's only available as plain text then generate HTML that is just an open <pre> tag and their text and a closing </pre> tag

Terminal to HTML app. Green glowing text on black. Instructions: Paste terminal output below. Supports RTF, HTML or plain text. There's an HTML Code area with a Copy HTML button, Save this to a Gist and a bunch of HTML. Below is the result of save to a gist showing a URL and a Copy URL button. Below that a preview with the Claude Code heading in ASCII art.

October 23, 2025 at 4:40 AM

Simon Willison

@simonwillison.net

Asynchronous coding agents are the fastest and safest route to running coding agents in a sandbox without constant supervision

The best sandboxes run on someone else's computer

Claude Code for Web
OpenAl Codex Cloud
Gemini Jules
ChatGPT & Claude code Interpreter

October 22, 2025 at 12:41 PM

Simon Willison

@simonwillison.net

Just for fun, I had Claude Code figure out how to run the ~2001-era Perl and C SLOCCount program in WebAssembly in the browser, complete with a UI for counting source code lines from pasted text, a GitHub repository or a zip file

simonwillison.net/2025/Oct/22/...

Here's an animated demo

October 22, 2025 at 6:24 AM

Simon Willison

@simonwillison.net

It's neat to see them encourage developers to add ARIA tags to pages though, an "agent" can be thought of as effectively another form of assistive technology

There was one other detail in the announcement post that caught my eye:

Website owners can also add ARIA tags to improve how ChatGPT agent works for their websites in Atlas.

Which links to this:

ChatGPT Atlas uses ARIA tags---the same labels and roles that support screen readers---to interpret page structure and interactive elements. To improve compatibility, follow WAI-ARIA best practices by adding descriptive roles, labels, and states to interactive elements like buttons, menus, and forms. This helps ChatGPT recognize what each element does and interact with your site more accurately.

A neat reminder that AI "agents" share many of the characteristics of assistive technologies, and benefit from the same affordances.

October 21, 2025 at 6:49 PM

Simon Willison

@simonwillison.net

Here's my vibe-coded tool for displaying the Responses JSON returned from a deep research API call in a more readable way: tools.simonwillison.net/deep-researc... - built by Claude Code in this session: gist.github.com/simonw/d2251... - because life's too short to waste it digging through raw JSON

Dashboard screenshot showing metrics at top: 17 Thinking Steps, 45 Searches, 24 Pages Visited, 12 Code Executions, 180 Total Steps. Below is a blue "Thinking" section with brain emoji containing text "**Researching orchestrions**" followed by a paragraph: "I'm considering a deep dive into specific orchestrions, particularly targeting places like museums. The idea is to gather data on surviving orchestrions and produce a structured list in a JSON format. Each entry will likely include details like city, country, venue, and notes about their history and significance. I realize this could be a challenging task, as orchestrions are quite rare. The goal is to compile a comprehensive overview, so I need to identify reliable sources of information." At bottom is a beige search box with magnifying glass icon showing: Search: "surviving orchestrion" locations

October 18, 2025 at 7:31 PM

Simon Willison

@simonwillison.net

I misquoted the llama.cpp performance numbers in my original post, here's the updated section which now distinguishes between token read speed and token generation speed

llama.cpp creator Georgi Gerganov just published extensive benchmark results from running llama.cpp on a Spark. He’s getting ~3,600 tokens/second to read the prompt and ~59 tokens/second to generate a response with the MXFP4 version of GPT-OSS 20B and ~817 tokens/second to read and ~18 tokens/second to generate for GLM-4.5-Air-GGUF.

October 15, 2025 at 12:45 AM

Simon Willison

@simonwillison.net

Claude Code tip: prompt "use sub-agents" and it will do exactly that, firing up multiple parallel sub-agents to achieve a task, each with a fresh context

"Write markdown documentation for the context passed to each of the templates in this project. Use sub-agents." simonwillison.net/2025/Oct/11/...

⏺ I'll help you write markdown documentation for the context passed to each template in this project. Let me start by exploring the codebase to understand the template structure and then use sub-agents to document them comprehensively.

⏺ Search(pattern: "**/*.html")
⎿ Found 47 files (ctrl+o to expand)

⏺ Search(pattern: "render.*template|render_template")
⎿ Found 16 files (ctrl+o to expand)

[...]

⏺ Now let me launch sub-agents to analyze different template contexts in parallel:

⏺ Task(Document index template context)
⎿ Done (12 tool uses · 86.4k tokens · 1m 13s)

⏺ Task(Document database template context)
⎿ Done (15 tool uses · 55.4k tokens · 1m 41s)

⏺ Task(Document table template context)
⎿ Done (14 tool uses · 85.2k tokens · 1m 44s)

⏺ Task(Document query/row template contexts)
⎿ Done (17 tool uses · 78.7k tokens · 1m 55s)

⏺ Task(Document special template contexts)
⎿ Done (26 tool uses · 116.1k tokens · 1m 50s)

October 11, 2025 at 9:32 PM

Simon Willison

@simonwillison.net

I really enjoy being able to spin up custom visualization tools as part of writing up a short blog post like this, here's my new Graphviz URL-based DOT renderer, built as a Claude artifact in a couple of minutes tools.simonwillison.net/dot#digraph%... - transcript: claude.ai/share/2b78a9...

$Screenshot of a page - title is DOT file renderer, then it says Paste your .dot file content here with a Load example link. Then a textarea with a digraph when_to_use { ... } DOT graph, and below it that graph rendered.$

October 10, 2025 at 11:40 PM

Simon Willison

@simonwillison.net

OpenAI released a new image gen model today - gpt-image-1-mini - which they describe as "80% less expensive than the large model"

I vibe-coded a Python CLI tool to try it out. Here's "racoon eating cheese wearing a top hat, realistic photo" which cost ~0.2 cents simonwillison.net/2025/Oct/6/g...

It's a square photo of a raccoon eating cheese and wearing a top hat. It looks pretty realistic.

October 6, 2025 at 10:57 PM

Simon Willison

@simonwillison.net

Joshua Rogers, using AI tooling responsibly and professionally, reported 22+ genuine issues in curl that are now being addressed

Especially notable because curl had problems with floods of garbage slop AI "security issues" in the past that were nothing of the sort simonwillison.net/2025/Oct/2/c...

daniel:// stenberg://
@bagder@mastodon.social

Joshua Rogers sent us a *massive* list of potential issues in #curl that he found using his set of AI assisted tools. Code analyzer style nits all over. Mostly smaller bugs, but still bugs and there could be one or two actual security flaws in there. Actually truly awesome findings.

I have already landed 22(!) bugfixes thanks to this, and I have over twice that amount of issues left to go through. Wade through perhaps.

Credited "Reported in Joshua's sarif data" if you want to look for yourself

October 2, 2025 at 3:16 PM

Simon Willison

@simonwillison.net

Where it really shines is in their new claude.ai Code Interpreter mode - I had it checkout my GitHub repo, install dependencies, run tests and experiment with a complex new feature, all prompted from the web browser on my iPhone simonwillison.net/2025/Sep/29/...

Prompt: The tool stores prompts and responses in a SQLite database. It currently uses a single conversation row and a linear collection of responses within that conversation. Experiment with the idea of adding a parent_response_id column to that responses table and using it to model the responses in a conversation as a tree instead. Use new pytest tests which you execute individually (to save time on the whole test suite) to exercise your design. Write a tree_notes.md file which you first populate with your plan and then use for notes as you go along.

Thoughts:

Exploring the llm project structure
Looking at the migrations file to understand the current database schema
Looking at the logging documentation to understand how conversations and responses work

Response: I'll explore adding a tree structure to the LLM conversation system. Let me start by examining the current database schema and then create a plan.

September 29, 2025 at 6:17 PM

Simon Willison

@simonwillison.net

I shared a few more notes here, including this hopefully reassuring note for anyone afraid of the impact this will have on their career as a programmer simonwillison.net/2025/Sep/29/...

For anyone who sees this as a threat to their livelihood as programmers, I encourage you to think more about this section:

It is easy to create systems that appear to behave correctly but have unclear runtime behavior when relying on agents. For instance, the AI doesn’t fully comprehend threading or goroutines. If you don’t keep the bad decisions at bay early it, you won’t be able to operate it in a stable manner later.

Here’s an example: I asked it to build a rate limiter. It “worked” but lacked jitter and used poor storage decisions. Easy to fix if you know rate limiters, dangerous if you don’t.

In order to use these tools at this level you need to know the difference between goroutines and threads. You need to understand why a rate limiter might want to"jitter" and what that actually means. You need to understand what "rate limiting" is and why you might need it!

These tools do not replace programmers. They allow us to apply our expertise at a higher level and amplify the value we can provide to other people.

September 29, 2025 at 4:29 PM

Simon Willison

@simonwillison.net

New on Niche Musems: my write-up
of a visit to the Musical Museum in Brentford, London... player pianos, self-playing violins, and orchestrions! www.niche-museums.com/115

Two orchestrions - one in a grand old cabinet full of organ pipes, the other in a modern cabinet with visible drums and an attached piano. A green gramophone horn sits between them.

A violin encompassed in a complex set of machinery - a self-playing violin!

A grand piano with a drawer pulled out that reveals a piano roll

A very grand Wurlitzer organ on stage in a concert hall

September 21, 2025 at 3:53 PM

Simon Willison

@simonwillison.net

Leaked memo from Amazon that warns existing H1B holders to avoid travel back into the USA after the September 21st deadline - their lawyers evaluated the new executive order as not just affecting new applications

Amazonians,

We've reviewed the Presidential Proclamation on H-1B visas that was released today and are actively working to gain greater clarity. Here's what you need to know right now: The proclamation creates a travel restriction starting September 21, 2025, at 12:01 a.m. EDT (9:01 p.m. PDT tomorrow). After this deadline, individuals cannot enter the U.S. on H-1B status without an additional $100,000 payment associated with their petition. Recommended actions for you to take:

If you have H-1B status and are in the U.S.: Stay in the country for now, even if you have travel planned for the immediate future. We will continue to provide updates as more details are available.

If you have H-4 dependent status: We also recommend you remain in the U.S., though the proclamation doesn't specifically mention H-4 dependents.

If you have H-1B or H-4 status and are outside the U.S.: Try to return before tomorrow's deadline if possible. We realize this is short notice but returning soon is advisable and you should make every effort possible to clear U.S. customs before 12:00 a.m. EDT (9:00 p.m. PDT) on Sunday, September 21, 2025.

At this time, if you have an H1-B or H-4 status and are unable to return before the deadline, we advise that you do not attempt to enter the U.S. until further guidance is provided.

September 20, 2025 at 4:14 PM

Simon Willison

@simonwillison.net

The worst offenders for constantly redefining agents with new, vague and inconsistent definitions are OpenAI themselves simonwillison.net/2025/Sep/18/...

OpenAI need to get their story straight #

The single biggest source of agent definition confusion I’m aware of is OpenAI themselves.

OpenAI CEO Sam Altman is fond of calling agents “AI systems that can do work for you independently”.

Back in July OpenAI launched a product feature called “ChatGPT agent” which is actually a browser automation system—toggle that option on in ChatGPT and it can launch a real web browser and use it to interact with web pages directly.

And in March OpenAI launched an Agents SDK with libraries in Python (openai-agents) and JavaScript (@openai/agents). This one is a much closer fit to the “tools in a loop” idea.

It may be too late for OpenAI to unify their definitions at this point. I’m going to ignore their various other definitions and stick with tools in a loop!

September 18, 2025 at 7:32 PM

Simon Willison

@simonwillison.net

Includes this note about why agents as human replacements is my least favorite definition - because unlike AI agents, humans have agency!

Agents as human replacements is my least favorite definition

If you talk to non-technical business folk you may encounter a depressingly common alternative definition: agents as replacements for human staff. This often takes the form of “customer support agents”, but you’ll also see cases where people assume that there should be marketing agents, sales agents, accounting agents and more.

If someone surveys Fortune 500s about their “agent strategy” there’s a good chance that’s what is being implied. Good luck getting a clear, distinct answer from them to the question “what is an agent?” though!

This category of agent remains science fiction. If your agent strategy is to replace your human staff with some fuzzily defined AI system (most likely a system prompt and a collection of tools under the hood) you’re going to end up sorely disappointed.

That’s because there’s one key feature that remains unique to human staff: accountability. A human can take responsibility for its action and learn from its mistakes. Putting an AI agent on a performance improvement plan makes no sense at all!

Amusingly enough, humans also have agency. They can form their own goals and intentions and act autonomously to achieve them—while taking accountability for those decisions. Despite the name, AI agents can do nothing of the sort.

September 18, 2025 at 7:24 PM

Simon Willison

@simonwillison.net

It's over! Nice to see a world championship where the USA and China didn't dominate the rankings

Best Team

22st USA
3344

22nd Algeria
3287

23rd Australia
3269

24th China
3147

25th Croatia
1875

September 13, 2025 at 3:02 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news