tadamcz.com
📍London
Codewars post-training detected
Codewars post-training detected
e.g. I have given 34 full hours of my life to Taylor Swift's "evermore"
e.g. I have given 34 full hours of my life to Taylor Swift's "evermore"
Google has no answers
Google has no answers
It accounts for the present value of your future savings, not just what you have now.
It accounts for the present value of your future savings, not just what you have now.
This is a SUPER common mistake with uncertainty modelling. I've seen it many times from Carlo customers.
This is a SUPER common mistake with uncertainty modelling. I've seen it many times from Carlo customers.
???
???
So we can remove some special-case code in our FrontierMath eval!
If you are the engineer at Anthropic who worked on this, show your boss this tweet
So we can remove some special-case code in our FrontierMath eval!
If you are the engineer at Anthropic who worked on this, show your boss this tweet
OpenAI's much-hyped "o3" model is so dumb, it doesn't even know
on what day, month, and year Olton Willem van Genderen (the Surinamese civil servant and politician) died!
Yet, Dario Amodei has been silent about this. Curious.
OpenAI's much-hyped "o3" model is so dumb, it doesn't even know
on what day, month, and year Olton Willem van Genderen (the Surinamese civil servant and politician) died!
Yet, Dario Amodei has been silent about this. Curious.
Completely correct answer I hadn't considered at all, and that might have taken me hours (days?) to find.
Wrong hypotheses in my prompt didn't sidetrack AI
Completely correct answer I hadn't considered at all, and that might have taken me hours (days?) to find.
Wrong hypotheses in my prompt didn't sidetrack AI
Today's AI is smart enough to find the bug in the React slop it wrote 7 months ago.
Today's AI is smart enough to find the bug in the React slop it wrote 7 months ago.
Eyeballing the plot, the SOTA improvement seems to be slowing down, compared to the progress we saw between Sonnet 3.5 and Opus 4.
Eyeballing the plot, the SOTA improvement seems to be slowing down, compared to the progress we saw between Sonnet 3.5 and Opus 4.
pass@the-kitchen-sink
On a benchmark, count all problems that _any_ LLM/scaffold/system has ever solved at least once.
pass@the-kitchen-sink
On a benchmark, count all problems that _any_ LLM/scaffold/system has ever solved at least once.
Automate the unionized fuckers away. It Just Works.
Automate the unionized fuckers away. It Just Works.