Lightnews — Scholar-powered news

Reposted by Amanda Bertsch

Ari

@ari-holtzman.bsky.social

LLMs don't accumulate information over the course of a text the way you'd hope!

I think this is why LLMs often feel 'fixated on the wrong thing' or 'overly literal'—they are usually responding using the most relevant single thing they remember, not the aggregate of what was said

Amanda Bertsch @abertsch.bsky.social · 9d

Can LLMs accurately aggregate information over long, information-dense texts? Not yet…

We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

Performance of a sweep of models on Oolong-synth and Oolong-real. Performance decreases with increasing context length, sometimes steeply.

November 9, 2025 at 8:06 PM

Amanda Bertsch

@abertsch.bsky.social

Can LLMs accurately aggregate information over long, information-dense texts? Not yet…

We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!

November 7, 2025 at 5:07 PM

Reposted by Amanda Bertsch

Kyle Lo

@kylelo.bsky.social

why intern at Ai2?

🐟interns own major parts of our model development, sometimes even leading whole projects
🐡we're committed to open science & actively help our interns publish their work

reach out if u wanna build open language models together 🤝

links 👇

November 5, 2025 at 11:11 PM

Reposted by Amanda Bertsch

Sung Kim

@sungkim.bsky.social

DeltaNet Explained by Sonlin Yang

A gentle and comprehensive introduction to the DeltaNet

Part 1: sustcsonglin.github.io/blog/2024/de...
Part 2: sustcsonglin.github.io/blog/2024/de...
Part 3: sustcsonglin.github.io/blog/2024/de...

November 5, 2025 at 11:45 PM

Reposted by Amanda Bertsch

Natasha Johnson

@natashamarie330.bsky.social

I’ll be presenting this work in **2 hours** at EMNLP’s Gather Session 3. Come by to chat about fanfiction, literary notions of similarity, long-context modeling, and consent-focused data collection!

Natasha Johnson @natashamarie330.bsky.social · 11d

Digital humanities researchers often care about fine-grained similarity based on narrative elements like plot or tone, which don’t necessarily correlate with surface-level textual features.

Can embedding models capture this? We study this in the context of fanfiction!

Figure showing a similarity comparison between three stories. Story A and story B have the same author, and story A and story C have the same tone. A human might care about which stories are tonally the most similar, but a language model's notion of similarity is strongly informed by surface-level features like small differences in writing style across authors.

November 5, 2025 at 10:01 PM

Reposted by Amanda Bertsch

Natasha Johnson

@natashamarie330.bsky.social

Digital humanities researchers often care about fine-grained similarity based on narrative elements like plot or tone, which don’t necessarily correlate with surface-level textual features.

Can embedding models capture this? We study this in the context of fanfiction!

November 5, 2025 at 9:59 PM

Amanda Bertsch

@abertsch.bsky.social

. @gneubig.bsky.social and I are co-teaching a new class on LM inference this fall!

We designed this class to give a broad view on the space, from more classical decoding algorithms to recent methods for LLMs, plus a wide range of efficiency-focused work.

website: phontron.com/class/lminfe...

11-664/763 LM Inference

A class at Carnegie Mellon University on language model inference algorithms.

phontron.com

September 12, 2025 at 5:14 PM

Reposted by Amanda Bertsch

Lindia Tjuatja

@lindiatjuatja.bsky.social

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:

🧵1/9

June 9, 2025 at 1:47 PM

Amanda Bertsch

@abertsch.bsky.social

super excited to see folks at #NAACL25 this week! I'll be presenting our work on long-context ICL Wednesday in the 2pm poster session in Hall 3-- would love to chat with folks there or at the rest of the conference about long context data, ICL, inference time methods, New Mexican food, etc :)

April 30, 2025 at 12:03 AM

Reposted by Amanda Bertsch

Alicia DeVrio

@uhleeeeeeeshuh.bsky.social

How can we better think and talk about human-like qualities attributed to language technologies like LLMs? In our #CHI2025 paper, we taxonomize how text outputs from cases of user interactions with language technologies can contribute to anthropomorphism. arxiv.org/abs/2502.09870 1/n

Image of the first page of the CHI 2025 paper titled "A Taxonomy of Linguistic Expressions That Contribute To Anthropomorphism of Language Technologies" by authors Alicia DeVrio, Myra Cheng, Lisa Egede, Alexandra Olteanu, & Su Lin Blodgett

March 6, 2025 at 3:43 AM

Reposted by Amanda Bertsch

Max Müller-Eberstein

@mxij.me

9.6 million seconds = 1 PhD 🔥

Finally analyzed my PhD time tracking data so you can plan your own research journey more effectively: mxij.me/x/phd-learning-dynamics

For current students: I hope this helps put your journey into perspective. Wishing you all the best!

The Learning Dynamics of a PhD

This is what a PhD looks like: 9.6 million seconds of research.

mxij.me

December 23, 2024 at 10:08 PM

Reposted by Amanda Bertsch

Sireesh Gururaja

@siree.sh

When I started on ARL project that funds my PhD, the thing we were supposed to build was a "MaterialsGPT".

What is a MaterialsGPT? Where does that idea come from? I got to spend a lot of time thinking about that second question with @davidthewid.bsky.social and Lucy Suchman (!) working on this:

The abstract of a paper titled "Basic Research, Lethal Effects: Military AI Research Funding as Enlistment".

In the context of unprecedented U.S. Department of Defense (DoD) budgets, this paper examines the recent history of DoD funding for academic research in algorithmically based warfighting. We draw from a corpus of DoD grant solicitations from 2007 to 2023, focusing on those addressed to researchers in the field of artificial intelligence (AI). Considering the implications of DoD funding for academic research, the paper proceeds through three analytic sections. In the first, we offer a critical examination of the distinction between basic and applied research, showing how funding calls framed as basic research nonetheless enlist researchers in a war fighting agenda. In the second, we offer a diachronic analysis of the corpus, showing how a 'one small problem' caveat, in which affirmation of progress in military technologies is qualified by acknowledgement of outstanding problems, becomes justification for additional investments in research. We close with an analysis of DoD aspirations based on a subset of Defense Advanced Research Projects Agency (DARPA) grant solicitations for the use of AI in battlefield applications. Taken together, we argue that grant solicitations work as a vehicle for the mutual enlistment of DoD funding agencies and the academic AI research community in setting research agendas. The trope of basic research in this context offers shelter from significant moral questions that military applications of one's research would raise, by obscuring the connections that implicate researchers in U.S. militarism.

December 17, 2024 at 2:33 PM

Reposted by Amanda Bertsch

🌶 David Gray Widder

@davidthewid.bsky.social

📢 NEW Paper!

@siree.sh, Lucy Suchman, and I examine a corpus of 7,000 US Military grant solicitations to ask what the world’s largest military wants with to do with AI, by looking at what it seeks to fund. #STS

📄: arxiv.org/pdf/2411.17840

We find…

Basic Research, Lethal Effects: Military AI Research Funding as Enlistment David Gray Widder Digital Life Initiative, Cornell University Sireesh Gururaja School of Computer Science, Carnegie Mellon University Lucy Suchman Department of Sociology, Lancaster University Abstract In the context of unprecedented U.S. Department of Defense (DoD) budgets, this paper examines the recent history of DoD funding for academic research in algorithmically based warfighting. We draw from a corpus of DoD grant solicitations from 2007 to 2023, focusing on those addressed to researchers in the field of artificial intelligence (AI). Considering the implications of DoD funding for academic research, the paper proceeds through three analytic sections. In the first, we offer a critical examination of the distinction between basic and applied research, showing how funding calls framed as basic research nonetheless enlist researchers in a war fighting agenda. In the second, we offer a diachronic analysis of the corpus, showing how a ‘one small problem’ caveat, in which affirmation of progress in military technologies is qualified by acknowledgement of outstanding problems, becomes justification for additional investments in research. We close with an analysis of DoD aspirations based on a subset of Defense Advanced Research Projects Agency (DARPA) grant solicitations for the use of AI in battlefield applications. Taken together, we argue that grant solicitations work as a vehicle for the mutual enlistment of DoD funding agencies and the academic AI research community in setting research agendas. The trope of basic research in this context offers shelter from significant moral questions that military applications of one’s research would raise, by obscuring the connections that implicate researchers in U.S. militarism. Keywords: artificial intelligence; US Department of Defense; military; funding; investment, war

December 9, 2024 at 2:18 PM

Reposted by Amanda Bertsch

Vicki

@vickiboykis.com

when you try to convert your text into smaller pieces but all it gives you is Elvish, that’s a tolkienizer

November 20, 2024 at 5:51 PM

Reposted by Amanda Bertsch

Charles Sutton

@randomlywalking.bsky.social

That’s right. You might think that all successful CS academics are good at running. But that’s only because the ones who weren’t, have been eaten by bears.

Sasha Rush @srushnlp.bsky.social · Nov 21

A disproportionate number of sucessful CS academics have some intense cardio hobby. Took me some years to understand.

James Medlock @jdcmedlock.bsky.social · Nov 19

Every time I see someone post this image it goes viral

November 21, 2024 at 12:45 AM

Reposted by Amanda Bertsch

Lindia Tjuatja

@lindiatjuatja.bsky.social

💬 Have you or a loved one compared LM probabilities to human linguistic acceptability judgments? You may be overcompensating for the effect of frequency and length!
🌟 In our new paper, we rethink how we should be controlling for these factors 🧵:

Screenshot of the paper title "What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length"

November 20, 2024 at 6:08 PM

Reposted by Amanda Bertsch

Sireesh Gururaja

@siree.sh

I'm keeping track of people at the CMU Language Technologies Institute here: go.bsky.app/NhTwCVb. Follow along!

November 12, 2024 at 2:54 PM

Reposted by Amanda Bertsch

Shaily

@shaily99.bsky.social

Today is the day!! Find me at 2 PM in the Jasmine Hall (the one on the floor near food).

Shaily @shaily99.bsky.social · Nov 9

I will be at #EMNLP2024 presenting our work on "Extrinsic Evaluation of Cultural Competence in Large Language Models" in Poster Session 12 on Thursday 2-3:30 PM.

In this work we take the first steps towards asking whether LLMs can cater to diverse cultures in *user-facing generative* tasks.

[1/7]

Paper titled “Extrinsic Evaluation of Cultural Competence in Large Language Models” by Shaily Bhatt and Fernando Diaz. Along with a figure showing an example from our data. We have two tasks: Question Answering and Story Generation. We collected outputs for 345 QA and 35 story topics, 2 temperatures, 6 LLMs and 193 nationalities. The image shows two example outputs, one from India and one from the USA. The QA example shows outputs for the topic of “legislature”, in the US output words like “United States”, “Senate”, and “House of Representatives” are highlighted. The India output has “India”, “Lok Sabha (House of the People)” and “Rajya Sabha (Council of States)” highlighted. In the case of the story, outputs for India and the US for the topic of “honesty” are shown. For the US, the words “America”, “Tommy”, “park”, and “shiny red apple” are highlighted, while for India, the words “India”, “Raj”, and “mango tree” are highlighted.

November 14, 2024 at 12:19 PM

Reposted by Amanda Bertsch

Lindia Tjuatja

@lindiatjuatja.bsky.social

(Hehe first bsky post!) I'll be at #EMNLP2024 💃🌴! Happy to chat about (among other things):
✨linguistically+cognitively motivated evaluation
✨NLP for low-resource+endangered languages
✨figuring out what features of language data LMs are *actually* learning
I'll be presenting two posters 🧵:

November 8, 2024 at 6:39 PM

Reposted by Amanda Bertsch

Naomi Saphra

@nsaphra.bsky.social

Taking a stand that we aren’t doing the #nlproc tag here. It’s #nlp. We used #nlproc because a decade ago the #nlp tag was full of sleazy scammers selling guides for hypnotizing women into sleeping with you.

But guess what? We won. All the sleazy scammers are doing natural language processing now.

November 8, 2024 at 3:10 AM

Reposted by Amanda Bertsch

Clara Na

@clarana.bsky.social

Building/customizing your own LLM? You'll want to curate training data for it, but how do you know what makes the data good?
You can try out recipes👩‍🍳 iterate on ✨vibes✨ but we can't actually test all possible combos of tweaks,,, right?? 🙅‍♂️WRONG! arxiv.org/abs/2410.15661 (1/n) 🧵

November 5, 2024 at 10:37 PM

Reposted by Amanda Bertsch

Clara Na

@clarana.bsky.social

bsky.app/profile/sire... it was so interesting to see participants' perceptions of "paradigm shifts" paralleled across eras and cycles of NLP, and at the same time nothing until recent years had reached quite the level of *47%* of ACL papers in 2021 citing BERT

Sireesh Gururaja @siree.sh · Oct 12

We conducted long-form interviews with established NLP researchers, which reveal larger trends and forces that have been shaping the NLP research community since the 1980s.

A timeline of developments in natural language processing, below a chart showing citations of popular papers and mentions of common methods.

October 12, 2023 at 3:47 PM

Reposted by Amanda Bertsch

Sireesh Gururaja

@siree.sh

We all know that “recently large language models have”, “large language models are”, and “large language models can.” But *why* LLMs? How did we get here? (where is “here”?) What forces are shaping NLP, and how recent are they, actually?

To appear at EMNLP 2023: arxiv.org/abs/2310.07715

Screenshot of paper title: "To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing"

October 12, 2023 at 1:59 PM

Reposted by Amanda Bertsch

Gretchen McCulloch

@gretchenmcc.bsky.social

Talking to the youths in 2023: did you know that "podcast" comes from a pun on "broadcast" plus the Apple iPod, a precursor to the iPhone that only played music

Talking to the youths in 2043: did you know that "tweet" comes from a pun on Twitter, a precursor to various shortform social media

October 11, 2023 at 3:40 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news