Lightnews — Scholar-powered news

Reposted

Eugene Yan

@eugeneyan.com

Can't wait for when I can vibe code a production recommender system.

Until then, here's some system designs:

• Retrieval vs. Ranking: eugeneyan.com/writing/syst...
• Real-time retrieval: eugeneyan.com/writing/real...
• Personalization: eugeneyan.com/writing/patt...

April 8, 2025 at 5:14 AM

testerpce.bsky.social

@testerpce.bsky.social

Awesome working with the IITB folks. Super happy to see this

arxiv cs.CL @arxiv-cs-cl.bsky.social · Mar 25

Soumen Kumar Mondal, Sayambhu Sen, Abhishek Singhania, Preethi Jyothi
Language-specific Neurons Do Not Facilitate Cross-Lingual Transfer
https://arxiv.org/abs/2503.17456

March 25, 2025 at 2:52 PM

Reposted

Sung Kim

@sungkim.bsky.social

Dataset Distillation (2018/2020)

They show that it is possible to compress 60,000 MNIST training images into just 10 synthetic distilled images (one per class) and achieve close to original performance with only a few gradient descent steps, given a fixed network initialization.

March 5, 2025 at 12:23 AM

Reposted

Adam L

@adam-lg.bsky.social

"Wild posteriors in the wild"

A cool-looking paper if you're interested in funky posterior geometry

Link: arxiv.org/abs/2503.00239
Code: github.com/YunyiShen/we...

#stats

Bayesian posterior approximation has become more accessible to practitioners than ever, thanks to modern black-box software. While these tools provide highly accurate approximations with minimal user effort, certain posterior geometries remain notoriously difficult for standard methods. As a result, research into alternative approximation techniques continues to flourish. In many papers, authors validate their new approaches by testing them on posterior shapes deemed challenging or "wild." However, these shapes are not always directly linked to real-world applications where they naturally occur. In this note, we present examples of practical applications that give rise to some commonly used benchmark posterior shapes.

March 5, 2025 at 3:36 AM

Reposted

Kiran (She/Her) @ NeurIPS

@kirancodes.me

Anyone interested in interactive story generation? I genuinely love this stuff, and would love to talk about it with anyone who's interested

This is a little tool I made to experiment with generating like murder mysteries automatically~

February 28, 2025 at 9:29 PM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Xiaoyu Deng, Ye Zhang, Tianmin Guo, Yongzhe Zhang, Zhengjian Kang, Hang Yang
ChallengeMe: An Adversarial Learning-enabled Text Summarization Framework
https://arxiv.org/abs/2502.05084

February 10, 2025 at 6:36 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Roman Vashurin (Mohamed bin Zayed University of Artificial Intelligence), ...
CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs
https://arxiv.org/abs/2502.04964

February 10, 2025 at 6:37 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Haohao Zhu, Junyu Lu, Zeyuan Zeng, Zewen Bai, Xiaokun Zhang, Liang Yang, Hongfei Lin
Commonality and Individuality! Integrating Humor Commonality with Speaker Individuality for Humor Recognition
https://arxiv.org/abs/2502.04960

February 10, 2025 at 6:38 AM

Reposted

Alex Lew

@alexlew.bsky.social

@xtimv.bsky.social and I were just discussing this interesting comment in the DeepSeek paper introducing GRPO: a different way of setting up the KL loss.

It's a little hard to reason about what this does to the objective. 1/

Also note that, instead of adding KL penalty in the reward, GRPO regularizes by directly adding the KL divergence between the trained policy and the reference policy to the loss, avoiding complicating the calculation of the advantage.

February 10, 2025 at 4:32 AM

Reposted

Sung Kim

@sungkim.bsky.social

ByteDance's UltraMem

A new ultra sparse model that
- exhibits favorable scaling properties but outperforms MoE
- inference speed is 1.7x to 6.0x faster than MoE

Paper: Ultra-Sparse Memory Network ( arxiv.org/abs/2411.12364 )

February 10, 2025 at 6:42 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Masato Mita, Ryo Yoshida, Yohei Oseki
Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition
https://arxiv.org/abs/2502.04795

February 10, 2025 at 6:47 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Jing Yang, Max Glockner, Anderson Rocha, Iryna Gurevych
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
https://arxiv.org/abs/2502.04797

February 10, 2025 at 6:47 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Santiago Gonz\'alez-Silot, Andr\'es Montoro-Montarroso, Eugenio Mart\'inez C\'amara, Juan G\'omez-Romero
Enhancing Disinformation Detection with Explainable AI and Named Entity Replacement
https://arxiv.org/abs/2502.04863

February 10, 2025 at 6:46 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Herbert Ullrich, Tom\'a\v{s} Mlyn\'a\v{r}, Jan Drchal
Claim Extraction for Fact-Checking: Data, Models, and Automated Metrics
https://arxiv.org/abs/2502.04955

February 10, 2025 at 6:45 AM

Reposted

Andrew Drozdov

@mrdrozdov.com

"All you need to build a strong reasoning model is the right data mix."

The pipeline that creates the data mix:

January 26, 2025 at 11:30 PM

Reposted

Sung Kim

@sungkim.bsky.social

Zhipu AI's T1 (open-source: paper, code, dataset, and model)

Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

T1 is trained by scaling RL by encouraging exploration and understand inference scaling.

January 27, 2025 at 1:49 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Naihao Deng, Sheng Zhang, Henghui Zhu, Shuaichen Chang, Jiani Zhang, Alexander Hanbo Li, Chung-Wei Hang, Hideo Kobayashi, Yiqun Hu, Patrick Ng
Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models
https://arxiv.org/abs/2501.14717

January 27, 2025 at 5:45 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Ziyao Xu, Houfeng Wang
Investigating the (De)Composition Capabilities of Large Language Models in Natural-to-Formal Language Conversion
https://arxiv.org/abs/2501.14649

January 27, 2025 at 6:26 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Jia Yu, Fei Yuan, Rui Min, Jing Yu, Pei Chu, Jiayang Li, Wei Li, Ruijie Zhang, Zhenxiang Li, Zhifei Ren, Dong Zheng, Wenjian Zhang, Yan Teng, Lingyu Meng, ...
WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages
https://arxiv.org/abs/2501.14506

January 27, 2025 at 6:42 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Jie He, Yijun Yang, Wanqiu Long, Deyi Xiong, Victor Gutierrez Basulto, Jeff Z. Pan
Evaluating and Improving Graph to Text Generation with Large Language Models
https://arxiv.org/abs/2501.14497

January 27, 2025 at 6:43 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Verena Blaschke, Masha Fedzechkina, Maartje ter Hoeve
Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter
https://arxiv.org/abs/2501.14491

January 27, 2025 at 6:56 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Zeping Yu, Sophia Ananiadou
Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing
https://arxiv.org/abs/2501.14457

January 27, 2025 at 7:06 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Xu Chu, Zhijie Tan, Hanlin Xue, Guanyu Wang, Tong Mo, Weiping Li
Domaino1s: Guiding LLM Reasoning for Explainable Answers in High-Stakes Domains
https://arxiv.org/abs/2501.14431

January 27, 2025 at 7:06 AM

Reposted

arxiv cs.CL

@arxiv-cs-cl.bsky.social

Xinyu Ma, Yifeng Xu, Yang Lin, Tianlong Wang, Xu Chu, Xin Gao, Junfeng Zhao, Yasha Wang
DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing
https://arxiv.org/abs/2501.14371

January 27, 2025 at 7:12 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news