youtu.be/AUAHkhOldx8
youtu.be/AUAHkhOldx8
Some thoughts below:
- Interesting that they use such a deep architecture for such small models (64 layers for 56M and 80 layers for 321M parameters)
Some thoughts below:
- Interesting that they use such a deep architecture for such small models (64 layers for 56M and 80 layers for 321M parameters)
TLDR: AGI is defined through ten measurable cognitive domains using psychometric theory.
TLDR: AGI is defined through ten measurable cognitive domains using psychometric theory.
Very cool collection of retrieval datasets all available on the Hugging Face hub!
Great work by Umar Butler, Abdur-Rahman Butler, Adrian Lucas Malec!
Very cool collection of retrieval datasets all available on the Hugging Face hub!
Great work by Umar Butler, Abdur-Rahman Butler, Adrian Lucas Malec!
Apparently they implemented hybrid search using their own fine-tuned ModernBERT model publicly available on the Hugging Face hub!
Congrats to @michaeljaylissner and the Free Law Project for making this happen!
Apparently they implemented hybrid search using their own fine-tuned ModernBERT model publicly available on the Hugging Face hub!
Congrats to @michaeljaylissner and the Free Law Project for making this happen!
RL gives sparse feedback and burns compute. Off-policy distillation is efficient but learns in the teacher's states, not the student's, causing compounding errors on long sequences.
RL gives sparse feedback and burns compute. Off-policy distillation is efficient but learns in the teacher's states, not the student's, causing compounding errors on long sequences.
TLDR: Models ignore user instructions while reasoning despite following them in final outputs.
TLDR: Models ignore user instructions while reasoning despite following them in final outputs.
AA-LCR is a set of 100 tough questions where you need to piece together answers from several real-world documents—sometimes really big ones—so you can’t just copy and paste the answers.
AA-LCR is a set of 100 tough questions where you need to piece together answers from several real-world documents—sometimes really big ones—so you can’t just copy and paste the answers.
Thanks to Gian Sbetta and Edouard Treccani for inviting me to a great first AI Builders event in Zurich this evening!
Had lots of great conversations with super interesting people!
Thanks to Gian Sbetta and Edouard Treccani for inviting me to a great first AI Builders event in Zurich this evening!
Had lots of great conversations with super interesting people!
TLDR: It is very strong on GPQA, especially for its size, but underperforms on LEXam.
TLDR: It is very strong on GPQA, especially for its size, but underperforms on LEXam.
Thanks Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf!
Thanks Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro Werra, Thomas Wolf!
We're excited to share our latest LEXam evaluation results:
- GPT-5 claims the #1 position, outperforming Gemini 2.5 Pro and setting a new state-of-the-art for legal reasoning on LEXam!
We're excited to share our latest LEXam evaluation results:
- GPT-5 claims the #1 position, outperforming Gemini 2.5 Pro and setting a new state-of-the-art for legal reasoning on LEXam!
⚙️ The Setup
I evaluated ten frontier models on LEXam (English MC subset) using an "I don't know" (IDK) protocol.
⚙️ The Setup
I evaluated ten frontier models on LEXam (English MC subset) using an "I don't know" (IDK) protocol.
"GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning" presents such elegant ideas by a collection of amazing researchers!
Here is a tldr of how it works:
"GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning" presents such elegant ideas by a collection of amazing researchers!
Here is a tldr of how it works:
Yoshua Bengio, the most-cited computer scientist in the world, is 1 week away from becoming the first ML researcher to hit 1 million citations! 🤯
At his current rate of 366 citations/day, he'll reach this unprecedented milestone around October 27th 🎯
Yoshua Bengio, the most-cited computer scientist in the world, is 1 week away from becoming the first ML researcher to hit 1 million citations! 🤯
At his current rate of 366 citations/day, he'll reach this unprecedented milestone around October 27th 🎯
Sycophancy, the phenomenon of excessively agreeing with or flattering users, is a pervasive issue in current LLMs.
Findings:
Sycophancy, the phenomenon of excessively agreeing with or flattering users, is a pervasive issue in current LLMs.
Findings: