Lightnews — Scholar-powered news

Avijit Ghosh

@evijit.io

Incredible work done with literally the smartest and most passionate researchers I am lucky to work with. Paper co-led with @ankareuel.bsky.social and Jenny Chim, and other co-authors!

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

This only strengthens our position that good-quality, independent third-party evaluations are paramount for AI safety.

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

First-party reports are less transparent or lower quality. We conducted interviews with eval practitioners and found that companies have laid off or reassigned teams dedicated to documentation & social impact evals, or they are being told to focus more on capability reporting.

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

This is true even at the provider level. We find for e.g., that Google used to do a lot more reporting about their model evaluations in 2022 and 2023 but they reduced reporting in the Gemini era, and same can be seen for Meta over successive Llama versions.

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

We find that model developers have become less transparent about their eval results over time. For instance Env Cost reporting in first party reports (release docs, model cards, system cards) has drastically declined over time. Less than 15% mention labor or the environment!

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

Extremely thrilled to talk about our new paper: "Who Evaluates AI’s Social Impacts? Mapping Coverage And Gaps In First And Third Party Evaluations".

This is the first big project output from the
@eval-eval.bsky.social coalition! Thread below:

November 13, 2025 at 2:35 PM

Avijit Ghosh

@evijit.io

Trying to start a new hobby and the internet is useless. Maybe AI will finally kill unstructured information retrieval for good and then we will be forced to call or visit friends for help again

October 12, 2025 at 9:17 PM

Avijit Ghosh

@evijit.io

We are launching Hugging Science: A global community addressing these barriers through:
✅ Collaborative challenges targeting upstream problems
✅ Cross-disciplinary education
✅ Recognition for data & infrastructure work
✅ Community-owned infrastructure

All links follow 🤗

October 6, 2025 at 4:28 PM

Avijit Ghosh

@evijit.io

AI for scientific discovery is a social problem: In our new position paper, @cgeorgiaw.bsky.social and I show that culture, incentives, and coordination are the main obstacles to progress, and we are launching the Hugging Science Initiative to address this!

October 6, 2025 at 4:28 PM

Avijit Ghosh

@evijit.io

So fascinating (not really) to me that company execs and tier 1 AI conferences have gone in completely opposite directions as it relates to AI usage. Surely the best minds actually developing AI models know something about overreliance, productivity, and quality? Surely?

September 28, 2025 at 4:03 PM

Avijit Ghosh

@evijit.io

How does Claude have the same response? This is sus

September 6, 2025 at 4:30 PM

Avijit Ghosh

@evijit.io

These official ones are hideous oh god

August 31, 2025 at 4:28 AM

Avijit Ghosh

@evijit.io

I genuinely want to know the thought process here. Is each model iteration a new being? Is Claude 4.1 its own legal entity deserving of model welfare different from 4.0? Or is it like one human updating their world knowledge and becoming smarter? Was the very first trained Claude the robot embryo?

August 17, 2025 at 2:13 PM

Avijit Ghosh

@evijit.io

The product decision to discontinue older versions of ChatGPT and the comments on Reddit around that decision reminded me once again of discussions around “robot death”, which is real insofar as people’s feelings and emotions are real.

August 8, 2025 at 5:24 PM

Avijit Ghosh

@evijit.io

[New] Husbandposting! And yes we had a croquembouche for dessert because I saw it on masterchef once and I’ve always wanted that ❤️

July 15, 2025 at 7:20 PM

Avijit Ghosh

@evijit.io

Who are the most prolific contributors? Research institutions lead: AI2 (Allen Institute) emerges as one of the most active contributors, alongside significant activity from IBM, NVIDIA, and international organizations. The open source ecosystem spans far beyond Big Tech!

July 15, 2025 at 2:31 PM

Avijit Ghosh

@evijit.io

Let's also talk about datasets:

- Most downloaded datasets are evaluation benchmarks (MMLU, Squad, GLUE)
- Universities and research institutions dominate foundational data
- Domain-specific datasets thrive in finance, healthcare, robotics, and science
- Open datasets power most AI development!

July 15, 2025 at 2:31 PM

Avijit Ghosh

@evijit.io

Looking at a single model's stats often does not tell the full story of its usefulness. The Qwen, Llama, and Gemma models have led to a universe of derivative models on the hub, all made by the community. This is the beauty of open source!

July 15, 2025 at 2:31 PM

Avijit Ghosh

@evijit.io

Legacy models like Clip, GPT-2, BERT, etc. remain among the most downloaded models despite being years old, showing that modern chat interfaces represent just one slice of AI applications! The ecosystem is much more diverse than frontier model discussions suggest.

July 15, 2025 at 2:31 PM

Avijit Ghosh

@evijit.io

Small models consistently outperform large variants in downloads, even within the same model family.
This suggests practical deployment considerations often matter more than maximum capability. The community is building for real-world use, not just benchmarks.

July 15, 2025 at 2:31 PM

Avijit Ghosh

@evijit.io

Generally a big fan of LED frame stages, I absolutely loved the Eurovision main stage this year

July 1, 2025 at 4:13 PM

Avijit Ghosh

@evijit.io

I still think the set design of the Evita revival at the American Rep theater at Harvard was the most stunning interpretation of all time - I hope this concept makes it to Broadway at some point 🤩

July 1, 2025 at 4:06 PM

Avijit Ghosh

@evijit.io

Generative AI often renders the user invisible in their limited worldview. Please sign up for a short interactive workshop on AI, Misrepresentation and Mental Health, at both @facct.bsky.social in Athens, and Alt-FAccT in NYC! Limited space, so hurry!

Sign up here! tinyurl.com/ai-mirrors

June 17, 2025 at 1:52 PM

Avijit Ghosh

@evijit.io

Living downtown and literally 2 blocks from the Opera House is certainly a clutch because I’m always late to things. Catch Roméo et Juliette playing in Boston it was great 😍

June 9, 2025 at 12:11 PM

Avijit Ghosh

@evijit.io

How do you feel about compulsory vibe workplaces:

June 8, 2025 at 12:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news