Speaking 🇫🇷, English and 🇨🇱 Spanish | Living in Tübingen 🇩🇪 | he/him
https://gubri.eu
📍 Come chat with Tommaso at our poster on Friday 7th, 10:30–12:00 in Hall C3
📄 aclanthology.org/2025.emnlp-m...
📍 Come chat with Tommaso at our poster on Friday 7th, 10:30–12:00 in Hall C3
📄 aclanthology.org/2025.emnlp-m...
Evaluating large models on benchmarks like MMLU is expensive. DISCO cuts costs by up to 99% while still predicting well performance.
🔍 The trick: use a small subset of samples where models disagree the most. These are the most informative.
Join the dance party below 👇
Evaluating large models on benchmarks like MMLU is expensive. DISCO cuts costs by up to 99% while still predicting well performance.
🔍 The trick: use a small subset of samples where models disagree the most. These are the most informative.
Join the dance party below 👇
Huge congrats to the amazing team Tommaso Green, Haritz Puerto @coallaoh.bsky.social @oodgnas.bsky.social
🎉 Excited to share our paper "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" was accepted at #EMNLP main!
1/2
Huge congrats to the amazing team Tommaso Green, Haritz Puerto @coallaoh.bsky.social @oodgnas.bsky.social
An enormous amount of work showing the extent of coordinated scientific fraud and involvement of some editors.
The number of fraudulent publications grows at a rate far outpacing that of legitimate science.
www.pnas.org/doi/10.1073/...
An enormous amount of work showing the extent of coordinated scientific fraud and involvement of some editors.
The number of fraudulent publications grows at a rate far outpacing that of legitimate science.
www.pnas.org/doi/10.1073/...
We introduce C-SEO Bench, a benchmark to test if conversational SEO methods actually help.
Our finding? They don't. But traditional SEO still works because LLMs favour content already ranked higher in the prompt.
Excited to announce our new paper: C-SEO Bench: Does Conversational SEO Work?
🌐 RTAI: researchtrend.ai/papers/2506....
📄 Paper: arxiv.org/abs/2506.11097
💻 Code: github.com/parameterlab...
📊 Data: huggingface.co/datasets/par...
We introduce C-SEO Bench, a benchmark to test if conversational SEO methods actually help.
Our finding? They don't. But traditional SEO still works because LLMs favour content already ranked higher in the prompt.
🖼️ Catch us at Poster Session 8 - APP: NLP Applications
🗓️ May 2, 11:00 AM - 12:30 PM
🗺️ Hall 3
Hope to see you there!
🖼️ Catch us at Poster Session 8 - APP: NLP Applications
🗓️ May 2, 11:00 AM - 12:30 PM
🗺️ Hall 3
Hope to see you there!
At Parameter Lab, we believe openness and reproducibility are essential for advancing science, and we've put in our best effort to ensure it.
🤗 huggingface.co/collections/...
🧵 bsky.app/profile/dnns...
At Parameter Lab, we believe openness and reproducibility are essential for advancing science, and we've put in our best effort to ensure it.
🤗 huggingface.co/collections/...
🧵 bsky.app/profile/dnns...
Parameter Lab with Martin Gubri, Sangdoo Yun, Hwaran Lee, Seong Joon Oh! arxiv.org/abs/2403.059... [1/6]
Would love to hear any other tips if you have them!
This proved very popular on another (more evil) social media platform, so sharing here also 🙂
My 10 tips:
Would love to hear any other tips if you have them!
This proved very popular on another (more evil) social media platform, so sharing here also 🙂
My 10 tips:
🦹💥 We explore how to detect if an LLM was stolen or leaked🤖💥
We showcase how to use adversarial prompt as #fingerprint for #LLM.
A thread 🧵
⬇️⬇️⬇️
🦹💥 We explore how to detect if an LLM was stolen or leaked🤖💥
We showcase how to use adversarial prompt as #fingerprint for #LLM.
A thread 🧵
⬇️⬇️⬇️