Our new paper, 📎“Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations,” analyzes hundreds of evaluation reports and reveals major blind spots ‼️🧵 (1/7)
Our new paper, 📎“Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations,” analyzes hundreds of evaluation reports and reveals major blind spots ‼️🧵 (1/7)
This is the first big project output from the
@eval-eval.bsky.social coalition! Thread below:
This is the first big project output from the
@eval-eval.bsky.social coalition! Thread below:
We have a rock-star lineup of AI researchers and an amazing program. Please RSVP at the earliest! Stay tuned!
You’d say “Y’all. Not helping. What you need is obviously a labor movement.”
Request to join below! :)
evaleval.github.io/events/works...
Request to join below! :)
evaleval.github.io/events/works...
github.com/huggingface/...
github.com/huggingface/...
🤖 Did you know malicious actors can exploit trust in AI leaderboards to promote poisoned models in the community?
This week's paper 📜"Exploiting Leaderboards for Large-Scale Distribution of Malicious Models" by @iamgroot42.bsky.social explores this!
🤖 Did you know malicious actors can exploit trust in AI leaderboards to promote poisoned models in the community?
This week's paper 📜"Exploiting Leaderboards for Large-Scale Distribution of Malicious Models" by @iamgroot42.bsky.social explores this!
First up: Do Large Language Model Benchmarks Test Reliability?
🕵️ Is benchmark noise and label errors masking the true fragility of LLMs?
🖇️"Do Large Language Model Benchmarks Test Reliability?" - This paper by @joshvendrow.bsky.social provides insights!
First up: Do Large Language Model Benchmarks Test Reliability?
www.theverge.com/news/787076/...
www.theverge.com/news/787076/...
📰🗞💥
www.eventbrite.com/e/personal-a...
www.eventbrite.com/e/personal-a...
In some cases, ChatGPT-enmeshed spouses are using the tech to bully their partners.
futurism.com/chatgpt-marr...
huggingface.co/blog/waterma...
huggingface.co/blog/waterma...