Academic: language grounding, vision+language, interp, rigorous & creative evals, cogsci
Other: many sports, urban explorations, puzzles/quizzes
bennokrojer.com
It covered lots from crowdworker rights, the ideologies (doomers, EA, ...) and the silicon valley startup world to the many big egos and company-internal battles
Great work by @karenhao.bsky.social
It covered lots from crowdworker rights, the ideologies (doomers, EA, ...) and the silicon valley startup world to the many big egos and company-internal battles
Great work by @karenhao.bsky.social
love the detailed montreal spots mentioned
consider including such a section in your next appendix!
(paper by @a-krishnan.bsky.social arxiv.org/pdf/2504.050...)
love the detailed montreal spots mentioned
consider including such a section in your next appendix!
(paper by @a-krishnan.bsky.social arxiv.org/pdf/2504.050...)
Finally the video from Mila's speed science competition is on YouTube!
From a soup of raw pixels to abstract meaning
t.co/RDpu1kR7jM
Finally the video from Mila's speed science competition is on YouTube!
From a soup of raw pixels to abstract meaning
t.co/RDpu1kR7jM
There's a lot of talk about math reasoning these days, but this project made me appreciate what simple reasoning we humans take for granted, arising in our first months and years of living
As usual i also included "Behind The Scenes" in the Appendix:
There's a lot of talk about math reasoning these days, but this project made me appreciate what simple reasoning we humans take for granted, arising in our first months and years of living
As usual i also included "Behind The Scenes" in the Appendix:
(Mido Assran Nicolas Ballas @koustuvsinha.com @candaceross.bsky.social @quentin-garrido.bsky.social Mojtaba Komeili)
The Montreal office in general is a very fun place 👇
(Mido Assran Nicolas Ballas @koustuvsinha.com @candaceross.bsky.social @quentin-garrido.bsky.social Mojtaba Komeili)
The Montreal office in general is a very fun place 👇
We encourage the community to use MVPBench to check if the latest VideoLLMs possess a *real* understanding of the physical world!
We encourage the community to use MVPBench to check if the latest VideoLLMs possess a *real* understanding of the physical world!
In total we analyze 4 such shortcuts and find that model scores often don't change much:
In total we analyze 4 such shortcuts and find that model scores often don't change much:
We ask 🤔
What subtle shortcuts are VideoLLMs taking on spatio-temporal questions?
And how can we instead curate shortcut-robust examples at a large-scale?
We release: MVPBench
Details 👇🔬
We ask 🤔
What subtle shortcuts are VideoLLMs taking on spatio-temporal questions?
And how can we instead curate shortcut-robust examples at a large-scale?
We release: MVPBench
Details 👇🔬
Love these interactive maps
Love these interactive maps
Here we want to know at what point the model resolves that "it" refers to "Patchscopes" --> you could apply logitlens to it (www.lesswrong.com/posts/AcKRB8...) or... 1/2
Here we want to know at what point the model resolves that "it" refers to "Patchscopes" --> you could apply logitlens to it (www.lesswrong.com/posts/AcKRB8...) or... 1/2
(the original mega-thread has become too long and nested so reposting now as a new strategy)
Patchscopes: A Unifying Framework for Inspecting
Hidden Representations of Language Models
A few notes below 👇 I took less digital notes this time as i was sitting outside in the sun reading 🌞
(the original mega-thread has become too long and nested so reposting now as a new strategy)
Patchscopes: A Unifying Framework for Inspecting
Hidden Representations of Language Models
A few notes below 👇 I took less digital notes this time as i was sitting outside in the sun reading 🌞
1) maybe people *outside* of a field only see the field's best papers and thus think it is impactful while people *outside* the field are exposed to all the chaotic average paper
1) maybe people *outside* of a field only see the field's best papers and thus think it is impactful while people *outside* the field are exposed to all the chaotic average paper
There's tons of interesting nuanced insights in the paper, so here are just some I noted down:
There's tons of interesting nuanced insights in the paper, so here are just some I noted down:
--> this is called a mixed-methods analysis in social sciences
--> this is called a mixed-methods analysis in social sciences
Arguably the hardest challenge is how to define what "interpretability and analysis" means. They adopt a quite broad definition but do a good job imo, also including some eval work
Arguably the hardest challenge is how to define what "interpretability and analysis" means. They adopt a quite broad definition but do a good job imo, also including some eval work