Avijit Ghosh
banner
evijit.io
Avijit Ghosh
@evijit.io
Technical AI Policy Researcher at HuggingFace @hf.co 🤗. Current focus: Responsible AI, AI for Science, and @eval-eval.bsky.social‬!
Incredible work done with literally the smartest and most passionate researchers I am lucky to work with. Paper co-led with @ankareuel.bsky.social and Jenny Chim, and other co-authors!
November 13, 2025 at 2:35 PM
This only strengthens our position that good-quality, independent third-party evaluations are paramount for AI safety.
November 13, 2025 at 2:35 PM
First-party reports are less transparent or lower quality. We conducted interviews with eval practitioners and found that companies have laid off or reassigned teams dedicated to documentation & social impact evals, or they are being told to focus more on capability reporting.
November 13, 2025 at 2:35 PM
This is true even at the provider level. We find for e.g., that Google used to do a lot more reporting about their model evaluations in 2022 and 2023 but they reduced reporting in the Gemini era, and same can be seen for Meta over successive Llama versions.
November 13, 2025 at 2:35 PM
We find that model developers have become less transparent about their eval results over time. For instance Env Cost reporting in first party reports (release docs, model cards, system cards) has drastically declined over time. Less than 15% mention labor or the environment!
November 13, 2025 at 2:35 PM
Extremely thrilled to talk about our new paper: "Who Evaluates AI’s Social Impacts? Mapping Coverage And Gaps In First And Third Party Evaluations".

This is the first big project output from the
@eval-eval.bsky.social coalition! Thread below:
November 13, 2025 at 2:35 PM
Trying to start a new hobby and the internet is useless. Maybe AI will finally kill unstructured information retrieval for good and then we will be forced to call or visit friends for help again
October 12, 2025 at 9:17 PM
We are launching Hugging Science: A global community addressing these barriers through:
✅ Collaborative challenges targeting upstream problems
✅ Cross-disciplinary education
✅ Recognition for data & infrastructure work
✅ Community-owned infrastructure

All links follow 🤗
October 6, 2025 at 4:28 PM
AI for scientific discovery is a social problem: In our new position paper, @cgeorgiaw.bsky.social and I show that culture, incentives, and coordination are the main obstacles to progress, and we are launching the Hugging Science Initiative to address this!
October 6, 2025 at 4:28 PM
So fascinating (not really) to me that company execs and tier 1 AI conferences have gone in completely opposite directions as it relates to AI usage. Surely the best minds actually developing AI models know something about overreliance, productivity, and quality? Surely?
September 28, 2025 at 4:03 PM
How does Claude have the same response? This is sus
September 6, 2025 at 4:30 PM
These official ones are hideous oh god
August 31, 2025 at 4:28 AM
I genuinely want to know the thought process here. Is each model iteration a new being? Is Claude 4.1 its own legal entity deserving of model welfare different from 4.0? Or is it like one human updating their world knowledge and becoming smarter? Was the very first trained Claude the robot embryo?
August 17, 2025 at 2:13 PM
The product decision to discontinue older versions of ChatGPT and the comments on Reddit around that decision reminded me once again of discussions around “robot death”, which is real insofar as people’s feelings and emotions are real.
August 8, 2025 at 5:24 PM
[New] Husbandposting! And yes we had a croquembouche for dessert because I saw it on masterchef once and I’ve always wanted that ❤️
July 15, 2025 at 7:20 PM
Who are the most prolific contributors? Research institutions lead: AI2 (Allen Institute) emerges as one of the most active contributors, alongside significant activity from IBM, NVIDIA, and international organizations. The open source ecosystem spans far beyond Big Tech!
July 15, 2025 at 2:31 PM
Let's also talk about datasets:

- Most downloaded datasets are evaluation benchmarks (MMLU, Squad, GLUE)
- Universities and research institutions dominate foundational data
- Domain-specific datasets thrive in finance, healthcare, robotics, and science
- Open datasets power most AI development!
July 15, 2025 at 2:31 PM
Looking at a single model's stats often does not tell the full story of its usefulness. The Qwen, Llama, and Gemma models have led to a universe of derivative models on the hub, all made by the community. This is the beauty of open source!
July 15, 2025 at 2:31 PM
Legacy models like Clip, GPT-2, BERT, etc. remain among the most downloaded models despite being years old, showing that modern chat interfaces represent just one slice of AI applications! The ecosystem is much more diverse than frontier model discussions suggest.
July 15, 2025 at 2:31 PM
Small models consistently outperform large variants in downloads, even within the same model family.
This suggests practical deployment considerations often matter more than maximum capability. The community is building for real-world use, not just benchmarks.
July 15, 2025 at 2:31 PM
Generally a big fan of LED frame stages, I absolutely loved the Eurovision main stage this year
July 1, 2025 at 4:13 PM
I still think the set design of the Evita revival at the American Rep theater at Harvard was the most stunning interpretation of all time - I hope this concept makes it to Broadway at some point 🤩
July 1, 2025 at 4:06 PM
Generative AI often renders the user invisible in their limited worldview. Please sign up for a short interactive workshop on AI, Misrepresentation and Mental Health, at both @facct.bsky.social in Athens, and Alt-FAccT in NYC! Limited space, so hurry!

Sign up here! tinyurl.com/ai-mirrors
June 17, 2025 at 1:52 PM
Living downtown and literally 2 blocks from the Opera House is certainly a clutch because I’m always late to things. Catch Roméo et Juliette playing in Boston it was great 😍
June 9, 2025 at 12:11 PM
How do you feel about compulsory vibe workplaces:
June 8, 2025 at 12:40 PM