I think this is what @simonwillison.net and @emollick.bsky.social were getting at. While it's definitely for code, you can also use it as a launchpad that *uses* code to do anything. Single-purpose code
I think this is what @simonwillison.net and @emollick.bsky.social were getting at. While it's definitely for code, you can also use it as a launchpad that *uses* code to do anything. Single-purpose code
Our systems are very much not ready for the revelation that this is no longer true, as this planning objection AI shows
Our systems are very much not ready for the revelation that this is no longer true, as this planning objection AI shows
It also featured the best version of “I spoke to a local farmer about a data center”
It also featured the best version of “I spoke to a local farmer about a data center”
You’d say “Y’all. Not helping. What you need is obviously a labor movement.”
You’d say “Y’all. Not helping. What you need is obviously a labor movement.”
GPT-5 is nearly twice as expensive as Sonnet doing the same task just because Sonnet 4.5 is a better model.
Don't use token prices alone to pick a model!
Context-Bench evaluates how well language models can chain file operations, trace entity relationships, and manage long-horizon multi-step tool calling.
GPT-5 is nearly twice as expensive as Sonnet doing the same task just because Sonnet 4.5 is a better model.
Don't use token prices alone to pick a model!
Letta Code is our solution to the terminal-based coding assistant, but with state and learning built in.
No more compactions. Just specialist agents that learn your code with every commit.
Letta Code is our solution to the terminal-based coding assistant, but with state and learning built in.
No more compactions. Just specialist agents that learn your code with every commit.
Moonshot AI's Kimi Infra team dropped K2 Vendor Verifier where you can visually see the difference in tool call accuracy across providers on OpenRouter. TogetherAI looks really bad.
github.com/MoonshotAI/K...
Moonshot AI's Kimi Infra team dropped K2 Vendor Verifier where you can visually see the difference in tool call accuracy across providers on OpenRouter. TogetherAI looks really bad.
github.com/MoonshotAI/K...