It's only January and we have a half order of magnitude improvement in real-world performance, and yet people think progress is slowing down?
Really, we are running out of benchmarks.
It's only January and we have a half order of magnitude improvement in real-world performance, and yet people think progress is slowing down?
Really, we are running out of benchmarks.
For those who don't know, in 2017-18 7 Australian politicians had to resign and/or run again because it turned out they could be dual citizens (sometimes unknowingly).
This change in Canadian law could trigger more!
For those who don't know, in 2017-18 7 Australian politicians had to resign and/or run again because it turned out they could be dual citizens (sometimes unknowingly).
This change in Canadian law could trigger more!
Wired: Number of tokens to implement feature estimates
Wired: Number of tokens to implement feature estimates
LLM Comparator lets you try the same prompt on multiple LLMs, and continue the conversation across them.
tools.nicklothian.com/llm_comparat...
LLM Comparator lets you try the same prompt on multiple LLMs, and continue the conversation across them.
tools.nicklothian.com/llm_comparat...
Do the math! If you can hire a cloud H100 and run it above 50% capacity for something approaching the per token price inference providers are charging then it's easy to see they aren't.
Do the math! If you can hire a cloud H100 and run it above 50% capacity for something approaching the per token price inference providers are charging then it's easy to see they aren't.
github.com/serengil/Lig...
github.com/serengil/Lig...
-Graph traversal
-Tower of Hanoi
Graph traversal:
significant improvements over zero-shot prompting or in-context learning.
Notably the architecture lowers hallucination of invalid moves to 0% (from ~20% for 4-step paths!)
4/n
-Graph traversal
-Tower of Hanoi
Graph traversal:
significant improvements over zero-shot prompting or in-context learning.
Notably the architecture lowers hallucination of invalid moves to 0% (from ~20% for 4-step paths!)
4/n
colinmorris.github.io/blog/compoun...