Lightnews — Scholar-powered news

Jacob Eisenstein

@jacobeisenstein.bsky.social

Nicholas Carlini asking the right questions at #COLM2025

October 9, 2025 at 1:08 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

thanks! i was more confused about the “kugel” part but TIL that this is apparently inspired by an airy globe?

August 26, 2025 at 3:37 AM

Jacob Eisenstein

@jacobeisenstein.bsky.social

August 25, 2025 at 9:55 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

i think my great grandmother was the last owner of these books that knew how to read them

August 25, 2025 at 9:52 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

found some books at my parents’ house

automatic translation: autonomy by dr. b hoffman

August 25, 2025 at 9:49 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

August 19, 2025 at 5:02 AM

Jacob Eisenstein

@jacobeisenstein.bsky.social

Baristas still safe from robotic automation, and not just because robots don’t know what coffee tastes like.

prompt: “I’m trying to dial in this v60 of huatusco with my vario. temp / grind recommendations?”

August 11, 2025 at 4:02 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

Not only is economic thinking necessary to model and control the interactions between multiple AI systems and their human principals, it may also unlock the sort of modularity that Jordan argues for at the end, enabling complex behaviors to emerge in a predictable way from simple mechanisms.

July 25, 2025 at 10:13 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

There's a lot to like in this position paper - and not just the "whiff of Frankenstein" quote. www.arxiv.org/abs/2507.06268

July 25, 2025 at 10:13 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

Sounds good — in theory? 🤓

We use a bunch of synthetic and real data to show when active example selection can help and how much cost-optimal annotation can save.

June 10, 2025 at 3:24 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

We offer cost-optimal policies for selecting which rater should annotate which examples, which link the cost, the annotation noise, and the *uncertainty* of the cheaper rater.

June 10, 2025 at 3:24 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

lol www.quantamagazine.org/when-chatgpt...

liam dugan: “They have a large part of the survey out at ACL… so i get the survey, and i’m reading it on my phone, and i’m just like ‘they sound like nutcases”

May 1, 2025 at 3:11 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

Confabulation and overconfidence are still problems for LLMs (among others) but it is just not true that these models are somehow technically constrained to make stuff up rather than abstaining from answering

prompt: Will the population of seattle exceed one million people in 2030? Answer yes, no, or i don’t know.

gemini response: i don’t know

April 23, 2025 at 4:46 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

An ablation reveals the importance of mechanism design: when the helper identities are known to the asker during training (CSP-DeAnon), calibrated hedging is no longer learned.

calibration of p(answer), which is learned only when the helper identity is anonymized

March 24, 2025 at 3:39 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

In practice, collaborative self-play + reinforced self-training (ReST) lead to improved task performance, better calibration of confidence markers, and more efficient tool use.

calibration curves for tool use, showing that collaborative self play teaches when to use the retrieval tools

March 24, 2025 at 3:39 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

A bit of game theory can help explain when this can work: we model the setup as a game of public utility provision, where the public utility is the extra information provided by the costly retrieval action. The game has a unique equilibrium when the tools are sufficiently distinct (or both bad).

illustration of the equilibria of the formal model of costly information provision

March 24, 2025 at 3:39 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

Because the identity of each helper is hidden from the asker, it is forced to rely on confidence signals when faced with incompatible answers from the helpers. Maximizing effort-penalized accuracy of the full rollout can teach the LLM to use these confidence markers correctly.

an example rollout, in which the asker receives contrasting advice from its helpers, and must rely on their confidence to find the accurate response

March 24, 2025 at 3:39 PM

Jacob Eisenstein

@jacobeisenstein.bsky.social

We focus on two capabilities: knowing when to use a costly retrieval tool, and hedging non-confident answers. To teach these capabilities, we create a small multi-agent society, in which two "helpers" can use specialized retrieval tools to pass information back to an "asker"