Max Caldwell
banner
growth.wtf
Max Caldwell
@growth.wtf
Bootstrapping a software company. In my free time I try to do things with data and statistics, or post random thoughts here.
Not winning any acting awards but it feels like that can be fixed!
July 27, 2025 at 4:10 PM
...I would look at ARC-AGI as they are one benchmark where there has not been significant progress with new model releases—Opus 4 is worse than o3 and Gemini 2.5. I find that to be strong evidence of a slowdown.
July 25, 2025 at 4:10 PM
If you look at longer-standing benchmarks like mmlu, there's absolutely a huge slowdown. I was listening to the head of product for anthropic yesterday and he swears there's no slowdown and points to all of the new benchmarks that have been invented and beaten. Personally...
July 25, 2025 at 4:07 PM
Does it play well with the apache ecosystem too? arrow interop would be cool.
July 25, 2025 at 4:01 PM
Dangerous living for a Saturday morning, Tim!
March 22, 2025 at 2:36 PM
Yes, the top of the whole feed. for everyone.
March 12, 2025 at 6:52 PM
I had to scroll down my blue sky feed too far to find this. They should just pin you to the top or something.
March 12, 2025 at 6:52 PM
Seems useful! I will play with it myself
March 9, 2025 at 11:29 PM
4.5 would be too touchy-feely to put up a good fight. My money's on o3mini.
March 5, 2025 at 4:32 PM
14k to have hugging face yell at you about MPS non-stop for another 2 years? No thanks
March 5, 2025 at 4:28 PM
Straight up yesterday I was searching for the openapi.json for a server I'm using. Github kept servicing references to OpenAI in the same server repo. Super annoying
March 5, 2025 at 4:27 PM
Oh wow. End of an era.
March 5, 2025 at 4:26 PM
End of an era I guess. Not bad for their last non-reasoning model.
February 27, 2025 at 8:21 PM
I'm being very critical. It seems like it's probably a great model. I'll use it a lot.
February 27, 2025 at 8:16 PM
And that's it? Ok.
February 27, 2025 at 8:14 PM
Ok, 3x improvement on swe-lancer over o3? Now we're talking
February 27, 2025 at 8:11 PM