Visit https://zhuhao.me
Raising agents in the Opensocial.world
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos
It will highlight long-standing and emerging challenges of AI interacting w humans, society & the world.
⏰ May 3, 2:00pm-5:30pm Room Pecos
Out in Child Development:
"Learning Loopholes: The Development of Intentional
Misunderstandings in Children"
paper: srcd.onlinelibrary.wiley.com/doi/10.1111/...
preprint-pdf: www.tomerullman.org/papers/kids_...
Out in Child Development:
"Learning Loopholes: The Development of Intentional
Misunderstandings in Children"
paper: srcd.onlinelibrary.wiley.com/doi/10.1111/...
preprint-pdf: www.tomerullman.org/papers/kids_...
We address data limitations and offer a fresh evaluation method for these complex queries.
Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄: arxiv.org/abs/2502.17776
We address data limitations and offer a fresh evaluation method for these complex queries.
Curious how TREC TOT track test queries are created? Check out this thread 🧵 and our paper 📄: arxiv.org/abs/2502.17776
But humans are naturally quite good at this (>90% acc.)
Check it out!
➡️ arxiv.org/abs/2502.20490
But humans are naturally quite good at this (>90% acc.)
Check it out!
➡️ arxiv.org/abs/2502.20490
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
With EgoNormia, a 1.8k ego-centric video 🥽 QA benchmark, we show that this is surprisingly challenging!
We propose methods for training LLMs with open-ended, unsupervised interaction on live websites:
✅ OSS SoTA on WebVoyager
✅ world's smallest high-performing web-agent
Try it here: nnetnav.dev
We propose methods for training LLMs with open-ended, unsupervised interaction on live websites:
✅ OSS SoTA on WebVoyager
✅ world's smallest high-performing web-agent
Try it here: nnetnav.dev
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
Talk Arena is our first step towards building audio LMs into interactive agents. Try it out and let me know what you think. talkarena.org
Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)
Introducing talkarena.org — an open platform where users speak to LAMs and receive text responses. Through open interaction, we focus on rankings based on user preferences rather than static benchmarks.
🧵 (1/5)