Lightnews — Scholar-powered news

Reposted by Venkat

Jennifer Hu

@jennhu.bsky.social

New work to appear @ TACL!

Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.

Yet they often assign higher probability to ungrammatical strings than to grammatical strings.

How can both things be true? 🧵👇

Screenshot of a figure with two panels, labeled (a) and (b). The caption reads: "Figure 1: (a) Illustration of messages (left) and strings (right) in toy domain. Blue = grammatical strings. Red = ungrammatical strings. (b) Surprisal (negative log probability) assigned to toy strings by GPT-2."

November 10, 2025 at 10:11 PM

Reposted by Venkat

Chantal

@chantalsh.bsky.social

Syntax that spuriously correlates with safe domains can jailbreak LLMs - e.g. below with GPT4o mini

Our paper (co w/ Vinith Suriyakumar) on syntax-domain spurious correlations will appear at #NeurIPS2025 as a ✨spotlight!

+ @marzyehghassemi.bsky.social, @byron.bsky.social, Levent Sagun

October 24, 2025 at 4:23 PM

Reposted by Venkat

Kanishka Misra 🌊

@kanishka.bsky.social

"Although I hate leafy vegetables, I prefer daxes to blickets." Can you tell if daxes are leafy vegetables? LM's can't seem to! 📷

We investigate if LMs capture these inferences from connectives when they cannot rely on world knowledge.

New paper w/ Daniel, Will, @jessyjli.bsky.social

Title page of the paper: WUGNECTIVES: Novel Entity Inferences of Language Models from Discourse Connectives, with two figures at the bottom

Left: Our figure 1 -- comparing previous work, which usually predicted the connective given the arguments (grounded in the world); our work flips this premise by getting models to use their knowledge of connectives to predict something about the world.

Right: Our main results across 7 types of connective senses. Models are especially bad at Concession connectives.

October 16, 2025 at 3:27 PM

Reposted by Venkat

Kyle Mahowald

@kmahowald.bsky.social

UT Austin Linguistics is hiring in computational linguistics!

Asst or Assoc.

We have a thriving group sites.utexas.edu/compling/ and a long proud history in the space. (For instance, fun fact, Jeff Elman was a UT Austin Linguistics Ph.D.)

faculty.utexas.edu/career/170793

🤘

UT Austin Computational Linguistics Research Group – Humans processing computers processing humans processing language

sites.utexas.edu

October 7, 2025 at 8:53 PM

Reposted by Venkat

Juan Diego Rodriguez

@juand-r.bsky.social

Excited to present this at #COLM2025 tomorrow! (Tuesday, 11:00 AM poster session)

Juan Diego Rodriguez @juand-r.bsky.social · Apr 16

One of the ways that LLMs can be inconsistent is the "generator-validator gap," where LLMs deem their own answers incorrect.

🎯 We demonstrate that ranking-based discriminator training can significantly reduce this gap, and improvements on one task often generalize to others!

🧵👇

A visualization of the generator-validator gap, where the LM likelihoods of for the generator and discriminator forms of questions are poorly correlated.

Aligning the validator and generator rankings can fix it!

October 6, 2025 at 8:40 PM

Reposted by Venkat

Sasha Boguraev

@sashaboguraev.bsky.social

I will be giving a short talk on this work at the COLM Interplay workshop on Friday (also to appear at EMNLP)!

Will be in Montreal all week and excited to chat about LM interpretability + its interaction with human cognition and ling theory.

Sasha Boguraev @sashaboguraev.bsky.social · May 27

A key hypothesis in the history of linguistics is that different constructions share underlying structure. We take advantage of recent advances in mechanistic interpretability to test this hypothesis in Language Models.

New work with @kmahowald.bsky.social and @cgpotts.bsky.social!

🧵👇!

October 6, 2025 at 12:05 PM

Reposted by Venkat

Juan Diego Rodriguez

@juand-r.bsky.social

I’m excited for COLM this week!

Looking forward to chatting with people about interpretability, data efficient training, cog sci and LLM consistency.

October 4, 2025 at 2:53 PM

Reposted by Venkat

Kanishka Misra 🌊

@kanishka.bsky.social

The compling group at UT Austin (sites.utexas.edu/compling/) is looking for PhD students!

Come join me, @kmahowald.bsky.social, and @jessyjli.bsky.social as we tackle interesting research questions at the intersection of ling, cogsci, and ai!

Some topics I am particularly interested in:

Picture of the UT Tower with "UT Austin Computational Linguistics" written in bigger font, and "Humans processing computers processing human processing language" in smaller font

September 30, 2025 at 4:17 PM

Reposted by Venkat

Catherine Arnett

@catherinearnett.bsky.social

huggingface.co/blog/catheri...

There is no such thing as a tokenizer-free lunch

A Blog post by Catherine Arnett on Hugging Face

huggingface.co

September 25, 2025 at 3:14 PM

Venkat

@venkatasg.net

I love creating this graph every five years over ACL anthology titles and abstracts. Mentions of nuance/fine-grain seem to be doubling every five years 🙃 Nuance rising has yet to level off among *CL publications.

Graph showing an increasing proportion of articles in the ACL anthology containing the words 'nuance' or 'fine-grain' in the title/abstract, with an explosion in the last 10 years.

September 20, 2025 at 11:15 PM

Reposted by Venkat

Kanishka Misra 🌊

@kanishka.bsky.social

Accepted at #NeurIPS2025! So proud of Yulu and Dheeraj for leading this! Be on the lookout for more "nuanced yes/no" work from them in the future 👀

yuluqin.bsky.social @yuluqin.bsky.social · Jul 22

Does vision training change how language is represented and used in meaningful ways?🤔The answer is a nuanced yes! Comparing VLM-LM minimal pairs, we find that while the taxonomic organization of the lexicon is similar, VLMs are better at _deploying_ this knowledge. [1/9]

September 18, 2025 at 4:12 PM

Venkat

@venkatasg.net

Exhibit N on how synthetic text/AI detectors just don't work reliably. Generating some (long) sentences from GPT4.1 and GPT5 with the same prompt, the top open-source model on the RAID benchmark classifies most GPT4.1 outputs as synthetic and most GPT5 as not synthetic.

Density plot with X axis being probability of text being synthetic from an AI detector model. Plots show that GPT4.1 outputs are assigned high probability of being AI text, but GPT5 outputs are assigned low probability of being AI text.

September 10, 2025 at 8:05 PM

Reposted by Venkat

Adele Goldberg

@adelegoldberg.bsky.social

A brilliant linguist and sociolinguist, RIP
Gift article from NYT www.nytimes.com/2025/08/15/u...

Robin Lakoff, Expert on Language and Gender, Is Dead at 82

www.nytimes.com

August 17, 2025 at 12:23 PM

Reposted by Venkat

Jessy Li

@jessyjli.bsky.social

The Echoes in AI paper showed quite the opposite with also a story continuation setup.
Additionally, we present evidence that both *syntactic* and *discourse* diversity measures show strong homogenization that lexical and cosine used in this paper do not capture.

August 12, 2025 at 9:01 PM

Reposted by Venkat

Naomi Saphra

@nsaphra.bsky.social

ACL shortpaper appreciation thread because shortpapers are the best papers. What's the best <=5 pager? I'm nominating an oldie but a goodie

Stolen Probability: A Structural Weakness of Neural Language Models

Neural Network Language Models (NNLMs) generate probability distributions by applying a softmax function to a distance metric formed by taking the dot product of a prediction vector with all word vect...

arxiv.org

August 8, 2025 at 6:49 PM

Reposted by Venkat

Robbie Kubala

@rkubala.bsky.social

Happy to share the published version of "Art, Understanding, and Mystery"! I often hear some version of the thought that it's bad to understand artworks; this paper attempts to make that claim precise and show one way to defend artistic understanding! journals.publishing.umich.edu/ergo/article...

Art, Understanding, and Mystery

Apparent orthodoxy holds that artistic understanding is finally valuable. Artistic understanding—grasping, as such, the features of an artwork that make it aesthetically or artistically good or bad—is...

journals.publishing.umich.edu

August 6, 2025 at 9:12 PM

Venkat

@venkatasg.net

Decided to finally give programming 'agents'/vibe coding a whirl with something low-stakes/low-risk🤞🏾. Typeproof (built almost entirely with Gemini CLI) lets you view Jonathan Hoefler's (non-pangram) typeface proofs for any Google web font...(1/3) venkatasg.net/typeproof/

Typeface Proof

venkatasg.net

July 30, 2025 at 11:08 AM

Reposted by Venkat

Juan Diego Rodriguez

@juand-r.bsky.social

🎉 New Benchmark Alert: KRISTEVA – Close‑Reading for LLMs📚

I’m excited to announce a new paper accepted to ACL 2025, in collaboration with Patrick Sui, Philippe Laban, and others!

July 27, 2025 at 7:19 PM

Reposted by Venkat

Lindia Tjuatja

@lindiatjuatja.bsky.social

🇦🇹I'll be at #ACL2025! Recently I've been thinking about:
✨linguistically + cognitively-motivated evals (as always!)
✨understanding multilingualism + representation learning (new!)

I'll also be presenting a poster for BehaviorBox on Wed @ Poster Session 4 (Hall 4/5, 10-11:30)!

Lindia Tjuatja @lindiatjuatja.bsky.social · Jun 9

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:

🧵1/9

July 25, 2025 at 6:06 PM

Venkat

@venkatasg.net

@simonwillison.net I think you might be the best person to answer this question - but does gemini CLI (free tier) save the output of the chat session somewhere as a log file? Or is it possible to configure that?

July 25, 2025 at 7:22 AM

Reposted by Venkat

yuluqin.bsky.social

@yuluqin.bsky.social

Does vision training change how language is represented and used in meaningful ways?🤔The answer is a nuanced yes! Comparing VLM-LM minimal pairs, we find that while the taxonomic organization of the lexicon is similar, VLMs are better at _deploying_ this knowledge. [1/9]

July 22, 2025 at 4:46 AM

Venkat

@venkatasg.net

Have there been any re-analysis of the 'Is Google Making Us Stupid' article from 2008 in the news? I'm not saying LLMs are no different from Google, but if we're asking the same question again we need to revisit the old debate and reflect.
www.theatlantic.com/magazine/arc...

Is Google Making Us Stupid?

What the Internet is doing to our brains

www.theatlantic.com

July 15, 2025 at 10:44 AM

Reposted by Venkat

Janet Liu

@janetlauyeung.bsky.social

🦙 how well do LLMs encode discourse knowledge? does that generalize across languages?

🛎️ in our #ACL2025 paper, we uncover fascinating trends about multilingual discourse representations!

joint work w/ @florian-eichin.com @barbaraplank.bsky.social @mhedderich.bsky.social

📄 arxiv.org/abs/2503.10515

July 10, 2025 at 12:38 PM

Reposted by Venkat

Hongli Zhan ✈️ ICML

@hongli-zhan.bsky.social

I'll be at #ICML to present SPRI next week! Come by our poster on Tuesday, July 15, 4:30pm, and let’s catch up on LLM alignment! 😃

🚀TL;DR: We introduce Situated-PRInciples (SPRI), a framework that automatically generates input-specific principles to align responses — with minimal human effort.

🧵

July 8, 2025 at 3:05 PM

Reposted by Venkat

nikhil07prakash.bsky.social

@nikhil07prakash.bsky.social

How do language models track mental states of each character in a story, often referred to as Theory of Mind?

We reverse-engineered how LLaMA-3-70B-Instruct handles a belief-tracking task and found something surprising: it uses mechanisms strikingly similar to pointer variables in C programming!

June 24, 2025 at 5:13 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news