Lightnews — Scholar-powered news

momergul.bsky.social

@momergul.bsky.social

Was a challenge getting everything to fit 🙈

October 2, 2025 at 8:02 PM

momergul.bsky.social

@momergul.bsky.social

Work done with my great advisors Claire Cardie & Tanya Goyal.

Paper: arxiv.org/abs/2510.01152
Github link for code and checkpoints: github.com/momergul/mash

Pay-Per-Search Models are Abstention Models

LLMs cannot reliably recognize their parametric knowledge boundaries and often hallucinate answers to outside-of-boundary questions. In contrast, humans recognize their limitations and can either seek...

arxiv.org

October 2, 2025 at 7:40 PM

momergul.bsky.social

@momergul.bsky.social

Tons of other insights in the paper. We show that the strength of the helper / search tool is a key consideration. Replacing our retriever with an oracle results in all models converging to always seeking help. The noisiness of the retriever is a feature not a bug!

October 2, 2025 at 7:40 PM

momergul.bsky.social

@momergul.bsky.social

Baseline RL implementations often converge to sub-optimal policies that always or never search. MASH uses a lightweight warm start data generation & SFT pipeline that induces better search behaviors. MASH models can discover a mix of 0/1/2 searches as needed while baselines fail.

October 2, 2025 at 7:40 PM

momergul.bsky.social

@momergul.bsky.social

For (ii), MASH shows strong abstention behavior off-the-shelf! Its performance is analogous to abstention baselines that require pre-determining knowledge boundaries and model-specific training data. It beats SFT approaches and is competitive with DPO!

October 2, 2025 at 7:40 PM

momergul.bsky.social

@momergul.bsky.social

We evaluate MASH under 2 settings: (i) w/ access to search, (ii) w/o search as an abstention model.

For (i), MASH outperforms efficient search baselines, esp. for multi-hop datasets (7.6% accuracy boost), even matching search baselines w/o any search penalties!

October 2, 2025 at 7:40 PM

momergul.bsky.social

@momergul.bsky.social

💡Key idea: Reward accuracy but penalize searches during training. Under the right optimization pressure, LLMs learn to invoke search when their parametric knowledge is lacking. At inference, we simply remove this search access and treat any search invocation as a proxy for abstention!

October 2, 2025 at 7:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news