Lightnews — Scholar-powered news

LightNews

Sumit

@reachsumit.com

190 followers 36 following 1.8K posts

Senior MLE at Meta. Trying to keep up with the Information Retrieval domain! Blog: https://blog.reachsumit.com/ Newsletter: https://recsys.substack.com/

blog.reachsumit.com

Posts Media Videos Starter Packs

Pinned

Sumit @reachsumit.com · 7d

LLMs often don't know what they don't know. They'll confidently generate wrong answers rather than admit uncertainty. This miscalibration makes it difficult to determine when external retrieval is actually necessary.

blog.reachsumit.com/posts/2025/0...

Probing LLMs' Knowledge Boundary: Adaptive RAG, Part 3

This post introduces techniques that probe the LLM’s internal confidence and knowledge boundaries. We explore prompt-based confidence detection, consistency-based uncertainty estimation, and internal ...

blog.reachsumit.com

Sumit @reachsumit.com · 10h

RAG-Anything: All-in-One RAG Framework

Introduces a unified framework enabling comprehensive knowledge retrieval across all modalities through dual-graph construction and cross-modal hybrid retrieval.

📝 arxiv.org/abs/2510.12323
👨🏽‍💻 github.com/HKUDS/RAG-An...

RAG-Anything: All-in-One RAG Framework

Retrieval-Augmented Generation (RAG) has emerged as a fundamental paradigm for expanding Large Language Models beyond their static training limitations. However, a critical misalignment exists between...

Sumit @reachsumit.com · 10h

A Longitudinal Study on Different Annotator Feedback Loops in Complex RAG Tasks

Compares internal and external annotator groups over one year, finding that closer feedback loops create higher quality data with decreased quantity and diversity.

📝 arxiv.org/abs/2510.11897

A Longitudinal Study on Different Annotator Feedback Loops in Complex RAG Tasks

Grounding conversations in existing passages, known as Retrieval-Augmented Generation (RAG), is an important aspect of Chat-Based Assistants powered by Large Language Models (LLMs) to ensure they are ...

Sumit @reachsumit.com · 10h

Evaluating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries

Introduces a pipeline for creating complex multi-hop RAG queries that are unanswerable and resistant to shortcuts.

📝 arxiv.org/abs/2510.11956

Evaluating Retrieval-Augmented Generation Systems on Unanswerable, Uncheatable, Realistic, Multi-hop Queries

Real-world use cases often present RAG systems with complex queries for which relevant information is missing from the corpus or is incomplete. In these settings, RAG systems must be able to reject un...

Sumit @reachsumit.com · 10h

Reinforced Preference Optimization for Recommendation

Introduces an RL framework for LLM-based recommenders that uses constrained beam search and ranking rewards to improve negative sampling and improve ranking performance.

📝 arxiv.org/abs/2510.12211
👨🏽‍💻 github.com/sober-clever...

Reinforced Preference Optimization for Recommendation

Recent breakthroughs in large language models (LLMs) have fundamentally shifted recommender systems from discriminative to generative paradigms, where user behavior modeling is achieved by generating ...

Sumit @reachsumit.com · 10h

Simple Projection Variants Improve ColBERT Performance

Shows that replacing ColBERT's single-layer linear projection with deeper FFNs featuring residual connections and upscaled intermediate projections improves retrieval performance.

📝 arxiv.org/abs/2510.12327

Simple Projection Variants Improve ColBERT Performance

Multi-vector dense retrieval methods like ColBERT systematically use a single-layer linear projection to reduce the dimensionality of individual vectors. In this study, we explore the implications of ...

Sumit @reachsumit.com · 10h

Probing Latent Knowledge Conflict for Faithful Retrieval-Augmented Generation

Proposes a framework that uses hidden-state probing to detect knowledge conflicts in RAG systems and introduces conflict-aware fine-tuning.

📝 arxiv.org/abs/2510.12460
👨🏽‍💻 github.com/LinfengGao/C...

Probing Latent Knowledge Conflict for Faithful Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm to enhance the factuality of Large Language Models (LLMs). However, existing RAG systems often suffer from an unfaithfulness iss...

Sumit @reachsumit.com · 10h

SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression

Alibaba proposes Sequential Matryoshka Embedding Compression to reduce gradient variance during training, and adaptively select important dimensions.

📝 arxiv.org/abs/2510.12474

SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression

Large language models (LLMs) generate high-dimensional embeddings that capture rich semantic and syntactic information. However, high-dimensional embeddings exacerbate computational complexity and sto...

Sumit @reachsumit.com · 10h

SMILE: SeMantic Ids Enhanced CoLd Item Representation for Click-through Rate Prediction in E-commerce SEarch

Kuaishou uses RQ-OPQ encoding to enhance cold-start item representations by aligning collaborative signals with semantic information.

📝 arxiv.org/abs/2510.12604

SMILE: SeMantic Ids Enhanced CoLd Item Representation for Click-through Rate Prediction in E-commerce SEarch

With the rise of modern search and recommendation platforms, insufficient collaborative information of cold-start items exacerbates the Matthew effect of existing platform items, challenging platform ...

Sumit @reachsumit.com · 10h

The Role of Parametric Injection-A Systematic Study of Parametric Retrieval-Augmented Generation

Finds that parametric representations capture only partial semantic information but can enhance document understanding when combined with textual context.

📝 arxiv.org/abs/2510.12668

The Role of Parametric Injection-A Systematic Study of Parametric Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by retrieving external documents. As an emerging form of RAG, parametric retrieval-augmented generation (PRAG) encodes docume...

Sumit @reachsumit.com · 10h

SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

ByteDance presents a foundation model that supports multifaceted multimodal retrieval and classification by accommodating arbitrary modality inputs, including text, vision, and audio.

📝 arxiv.org/abs/2510.12709

SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

Multimodal embedding models aim to yield informative unified representations that empower diverse cross-modal tasks. Despite promising developments in the evolution from CLIP-based dual-tower architec...

Sumit @reachsumit.com · 10h

CTRL-Rec: Controlling Recommender Systems With Natural Language

Introduces a method that allows natural language control of traditional recommender systems in real-time with computational efficiency.

📝 arxiv.org/abs/2510.12742

CTRL-Rec: Controlling Recommender Systems With Natural Language

When users are dissatisfied with recommendations from a recommender system, they often lack fine-grained controls for changing them. Large language models (LLMs) offer a solution by allowing users to ...

Sumit @reachsumit.com · 10h

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

Apple introduces a multimodal LLM capable of on-demand, multi-turn web searches with dynamic query generation for image, text, and audio search tools.

📝 arxiv.org/abs/2510.12801

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

Multimodal Large Language Models (MLLMs) in real-world applications require access to external knowledge sources and must remain responsive to the dynamic and ever-changing real-world information in o...

Sumit @reachsumit.com · 1d

Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation

Surveys table question answering approaches in the LLM era, categorizing task setups, modeling strategies, and evaluation methods.

📝 arxiv.org/abs/2510.09671

Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation

Table Question Answering (TQA) aims to answer natural language questions about tabular data, often accompanied by additional contexts such as text passages. The task spans diverse settings, varying in...

Sumit @reachsumit.com · 1d

HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks

Introduces a framework to evaluate human performance on text embedding benchmarks, revealing that humans rank 4th among models with 77.6% average performance across 16 MTEB tasks.

📝 arxiv.org/abs/2510.10062

HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

Comparing human and model performance offers a valuable perspective for understanding the strengths and limitations of embedding models, highlighting where they succeed and where they fail to capture ...

Sumit @reachsumit.com · 1d

Domain-Specific Data Generation Framework for RAG Adaptation

Introduces a scalable framework for generating domain-grounded question-answer-context triples to enhance RAG system adaptation across diverse domains and components.

📝 arxiv.org/abs/2510.11217

Domain-Specific Data Generation Framework for RAG Adaptation

Retrieval-Augmented Generation (RAG) combines the language understanding and reasoning power of large language models (LLMs) with external retrieval to enable domain-grounded responses. Effectively ad...

Sumit @reachsumit.com · 1d

Differentiable Fast Top-K Selection for Large-Scale Recommendation

Kuaishou introduces a differentiable Top-K operator with linear O(n) time complexity for cascade ranking systems.

📝 arxiv.org/abs/2510.11472
👨🏽‍💻 github.com/zhangzhen97/...

Differentiable Fast Top-K Selection for Large-Scale Recommendation

Cascade ranking is a widely adopted paradigm in large-scale information retrieval systems for Top-K item selection. However, the Top-K operator is non-differentiable, hindering end-to-end training. Ex...

Sumit @reachsumit.com · 1d

MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest

Pinterest introduces a two-tower architecture that unifies multiple ad domains and optimization tasks using mixture-of-experts, replacing 9 production models.

📝 arxiv.org/abs/2510.09857

MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest

The lightweight ad ranking layer, living after the retrieval stage and before the fine ranker, plays a critical role in the success of a cascaded ad recommendation system. Due to the fact that there a...

Sumit @reachsumit.com · 1d

Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

NVIDIA introduces trains LLM-based search agents to generate multiple query variants simultaneously.

📝 arxiv.org/abs/2510.10009
👨🏽‍💻 shuzhao.me/ExpandSearch...

Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning

Reasoning-augmented search agents, such as Search-R1, are trained to reason, search, and generate the final answer iteratively. Nevertheless, due to their limited capabilities in reasoning and search,...

Sumit @reachsumit.com · 1d

LinearRAG: Linear Graph Retrieval Augmented Generation on Large-Scale Corpora

Constructs relation-free hierarchical graphs using lightweight entity extraction, reducing indexing time by over 77% while outperforming existing GraphRAG methods.

📝 arxiv.org/abs/2510.10114

LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora

Retrieval-Augmented Generation (RAG) is widely used to mitigate hallucinations of Large Language Models (LLMs) by leveraging external knowledge. While effective for simple queries, traditional RAG sys...

Sumit @reachsumit.com · 1d

Lost in the Middle: An Emergent Property from Information Retrieval Demands in LLMs

Demonstrates that lost-in-the-middle behavior in LLMs emerges from adapting to different information retrieval demands during training rather than being a flaw.

📝 arxiv.org/abs/2510.10276

Lost in the Middle: An Emergent Property from Information Retrieval Demands in LLMs

The performance of Large Language Models (LLMs) often degrades when crucial information is in the middle of a long context, a "lost-in-the-middle" phenomenon that mirrors the primacy and recency effec...

Sumit @reachsumit.com · 1d

ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval

Enables zero-shot generative retrieval across diverse IR tasks using natural language instructions.

📝 arxiv.org/abs/2510.10419
👨🏽‍💻 github.com/sunnweiwei/Z...

ZeroGR: A Generalizable and Scalable Framework for Zero-Shot Generative Retrieval

Generative retrieval (GR) reformulates information retrieval (IR) by framing it as the generation of document identifiers (docids), thereby enabling an end-to-end optimization and seamless integration...

Sumit @reachsumit.com · 1d

Hierarchical LoRA MoE for Efficient CTR Model Scaling

Meta proposes a hierarchical LoRA mixture of experts framework enabling parameter-efficient scaling for CTR prediction, achieving 0.20% AUC improvement with 18.5% FLOPs reduction.

📝 arxiv.org/abs/2510.10432

Hierarchical LoRA MoE for Efficient CTR Model Scaling

Deep models have driven significant advances in click-through rate (CTR) prediction. While vertical scaling via layer stacking improves model expressiveness, the layer-by-layer sequential computation ...

Sumit @reachsumit.com · 1d

RECON: Reasoning with Condensation for Efficient Retrieval-Augmented Generation

Integrates summarization into RL-based RAG systems to compress retrieved documents, reducing context length by 35%.

📝 arxiv.org/abs/2510.10448
👨🏽‍💻 github.com/allfornancy/...

RECON: Reasoning with Condensation for Efficient Retrieval-Augmented Generation

Retrieval-augmented generation (RAG) systems trained using reinforcement learning (RL) with reasoning are hampered by inefficient context management, where long, noisy retrieved documents increase cos...

Sumit @reachsumit.com · 1d

VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering

Introduces a hybrid framework combining multimodal preprocessing, tripartite retrieval, and two-stage domain-to-entity re-ranking

📝 arxiv.org/abs/2510.10828
👨🏽‍💻 github.com/simplew4y/Ve...

GitHub - simplew4y/VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering

An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering - simplew4y/VeritasFi

Sumit @reachsumit.com · 1d

Decoupled Multimodal Fusion for User Interest Modeling in Click-Through Rate Prediction

Alibaba proposes a framework that enables fine-grained interactions between ID-based and multimodal representations through decoupled target-aware attention.

📝 arxiv.org/abs/2510.11066

Decoupled Multimodal Fusion for User Interest Modeling in Click-Through Rate Prediction

Modern industrial recommendation systems improve recommendation performance by integrating multimodal representations from pre-trained models into ID-based Click-Through Rate (CTR) prediction framewor...