Jacob Eisenstein
banner
jacobeisenstein.bsky.social
Jacob Eisenstein
@jacobeisenstein.bsky.social
natural language processing and computational linguistics at google deepmind.
Nicholas Carlini asking the right questions at #COLM2025
October 9, 2025 at 1:08 PM
thanks! i was more confused about the “kugel” part but TIL that this is apparently inspired by an airy globe?
August 26, 2025 at 3:37 AM
August 25, 2025 at 9:55 PM
i think my great grandmother was the last owner of these books that knew how to read them
August 25, 2025 at 9:52 PM
found some books at my parents’ house
August 25, 2025 at 9:49 PM
August 19, 2025 at 5:02 AM
Baristas still safe from robotic automation, and not just because robots don’t know what coffee tastes like.

prompt: “I’m trying to dial in this v60 of huatusco with my vario. temp / grind recommendations?”
August 11, 2025 at 4:02 PM
Not only is economic thinking necessary to model and control the interactions between multiple AI systems and their human principals, it may also unlock the sort of modularity that Jordan argues for at the end, enabling complex behaviors to emerge in a predictable way from simple mechanisms.
July 25, 2025 at 10:13 PM
There's a lot to like in this position paper - and not just the "whiff of Frankenstein" quote. www.arxiv.org/abs/2507.06268
July 25, 2025 at 10:13 PM
Sounds good — in theory? 🤓

We use a bunch of synthetic and real data to show when active example selection can help and how much cost-optimal annotation can save.
June 10, 2025 at 3:24 PM
We offer cost-optimal policies for selecting which rater should annotate which examples, which link the cost, the annotation noise, and the *uncertainty* of the cheaper rater.
June 10, 2025 at 3:24 PM
May 1, 2025 at 3:11 PM
Confabulation and overconfidence are still problems for LLMs (among others) but it is just not true that these models are somehow technically constrained to make stuff up rather than abstaining from answering
April 23, 2025 at 4:46 PM
An ablation reveals the importance of mechanism design: when the helper identities are known to the asker during training (CSP-DeAnon), calibrated hedging is no longer learned.
March 24, 2025 at 3:39 PM
In practice, collaborative self-play + reinforced self-training (ReST) lead to improved task performance, better calibration of confidence markers, and more efficient tool use.
March 24, 2025 at 3:39 PM
A bit of game theory can help explain when this can work: we model the setup as a game of public utility provision, where the public utility is the extra information provided by the costly retrieval action. The game has a unique equilibrium when the tools are sufficiently distinct (or both bad).
March 24, 2025 at 3:39 PM
Because the identity of each helper is hidden from the asker, it is forced to rely on confidence signals when faced with incompatible answers from the helpers. Maximizing effort-penalized accuracy of the full rollout can teach the LLM to use these confidence markers correctly.
March 24, 2025 at 3:39 PM
We focus on two capabilities: knowing when to use a costly retrieval tool, and hedging non-confident answers. To teach these capabilities, we create a small multi-agent society, in which two "helpers" can use specialized retrieval tools to pass information back to an "asker"
March 24, 2025 at 3:39 PM
January 26, 2025 at 10:06 PM
December 23, 2024 at 5:29 PM
what’s up seattle
December 12, 2024 at 6:56 AM
December 12, 2024 at 6:55 AM
hmmm
December 12, 2024 at 6:45 AM
it’s not clear why we took the long around this, but i guess it is probably a nice view during the daytime
December 12, 2024 at 5:10 AM
realizing that my coding setup is not very robust to poor internet connectivity
December 12, 2024 at 3:38 AM