Lightnews — Scholar-powered news

Avijit Ghosh

@evijit.io

2.6K followers 720 following 360 posts

Technical AI Policy Researcher at HuggingFace @hf.co 🤗. Current focus: Responsible AI, AI for Science, and @eval-eval.bsky.social‬!

Posts Replies Media Videos

Avijit Ghosh

@evijit.io

A (very incomplete) frontend of Eval Cards can be found here: evalcards.evalevalai.com, and we are now collecting eval datasets (to show in eval cards) on github: github.com/evaleval/eve...

If you want to help see eval cards come alive, get in touch!

AI Evaluation Dashboard

Professional AI system evaluation and assessment tool

evalcards.evalevalai.com

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

Finally, what's next from here? Almost every developer we spoke to said that what we need is a standardized way of reporting, aggregating and comparing all the evals done by both 1st and 3rd parties for a model. This is actually our next project: Eval Cards!

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

Incredible work done with literally the smartest and most passionate researchers I am lucky to work with. Paper co-led with @ankareuel.bsky.social and Jenny Chim, and other co-authors!

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

Read the detailed results here: arxiv.org/abs/2511.05613

We also release the code, and the full annotated dataset on Hugging Face (link in paper).

Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations

Foundation models are increasingly central to high-stakes AI systems, and governance frameworks now depend on evaluations to assess their risks and capabilities. Although general capability evaluation...

arxiv.org

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

This only strengthens our position that good-quality, independent third-party evaluations are paramount for AI safety.

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

First-party reports are less transparent or lower quality. We conducted interviews with eval practitioners and found that companies have laid off or reassigned teams dedicated to documentation & social impact evals, or they are being told to focus more on capability reporting.

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

This is true even at the provider level. We find for e.g., that Google used to do a lot more reporting about their model evaluations in 2022 and 2023 but they reduced reporting in the Gemini era, and same can be seen for Meta over successive Llama versions.

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

We find that model developers have become less transparent about their eval results over time. For instance Env Cost reporting in first party reports (release docs, model cards, system cards) has drastically declined over time. Less than 15% mention labor or the environment!

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

We take a look at the entire eval landscape, specifically social impact evals across 7 dimensions: Bias & Harm, Sensitive Content, Performance Disparity, Env. Costs & Emissions, Privacy & Data, Financial Costs, and Moderation Labor. Who is reporting these evals?

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

… this looks like the Nature font oh no

November 11, 2025 at 10:54 PM

Avijit Ghosh

@evijit.io

The thing about non survey papers is that they can still be problematic/fake science etc, and arxiv needs a long overdue + moderated comments section

November 1, 2025 at 4:41 PM

Avijit Ghosh

@evijit.io

Yes! The Science/Tech/Cyber committee is doing really good work too. Well intentioned folks there trying to actually engage with researchers and industry folks. Love MA

October 24, 2025 at 7:05 PM

Avijit Ghosh

@evijit.io

Oof

October 20, 2025 at 9:33 PM

Avijit Ghosh

@evijit.io

I have started requesting that panel moderators provide a disclaimer at panels I am on that not all my opinions are provided by my employer. HF ppl largely believe in democratization of AI and open source, but we actually have intense healthy debates internally on edge topics! It's great :)

October 20, 2025 at 7:01 PM

Avijit Ghosh

@evijit.io

Huh, so interesting re: art therapy!

Re: The turning off adult content, this is already what Google does (SafeSearch on, off, or blurred, off by default). I do think it gives back agency to adult users without shaming sexual content from a puritan perspective.

October 20, 2025 at 6:45 PM

Avijit Ghosh

@evijit.io

This doesn’t quite answer what I’m asking. Currently there’s nothing preventing people from going to AO3, Literotica, etc. should those be banned too? What is it about porn specifically that seems to be the problem (as opposed to harms of personification/emotional attachment)

October 20, 2025 at 4:15 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news