Sumit
@reachsumit.com
190 followers 36 following 1.8K posts
Senior MLE at Meta. Trying to keep up with the Information Retrieval domain! Blog: https://blog.reachsumit.com/ Newsletter: https://recsys.substack.com/
Posts Media Videos Starter Packs
reachsumit.com
RAG-Anything: All-in-One RAG Framework

Introduces a unified framework enabling comprehensive knowledge retrieval across all modalities through dual-graph construction and cross-modal hybrid retrieval.

📝 arxiv.org/abs/2510.12323
👨🏽‍💻 github.com/HKUDS/RAG-An...
RAG-Anything: All-in-One RAG Framework
Retrieval-Augmented Generation (RAG) has emerged as a fundamental paradigm for expanding Large Language Models beyond their static training limitations. However, a critical misalignment exists between...
arxiv.org
reachsumit.com
A Longitudinal Study on Different Annotator Feedback Loops in Complex RAG Tasks

Compares internal and external annotator groups over one year, finding that closer feedback loops create higher quality data with decreased quantity and diversity.

📝 arxiv.org/abs/2510.11897
A Longitudinal Study on Different Annotator Feedback Loops in Complex RAG Tasks
Grounding conversations in existing passages, known as Retrieval-Augmented Generation (RAG), is an important aspect of Chat-Based Assistants powered by Large Language Models (LLMs) to ensure they are ...
arxiv.org
reachsumit.com
Reinforced Preference Optimization for Recommendation

Introduces an RL framework for LLM-based recommenders that uses constrained beam search and ranking rewards to improve negative sampling and improve ranking performance.

📝 arxiv.org/abs/2510.12211
👨🏽‍💻 github.com/sober-clever...
Reinforced Preference Optimization for Recommendation
Recent breakthroughs in large language models (LLMs) have fundamentally shifted recommender systems from discriminative to generative paradigms, where user behavior modeling is achieved by generating ...
arxiv.org
reachsumit.com
Simple Projection Variants Improve ColBERT Performance

Shows that replacing ColBERT's single-layer linear projection with deeper FFNs featuring residual connections and upscaled intermediate projections improves retrieval performance.

📝 arxiv.org/abs/2510.12327
Simple Projection Variants Improve ColBERT Performance
Multi-vector dense retrieval methods like ColBERT systematically use a single-layer linear projection to reduce the dimensionality of individual vectors. In this study, we explore the implications of ...
arxiv.org
reachsumit.com
SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression

Alibaba proposes Sequential Matryoshka Embedding Compression to reduce gradient variance during training, and adaptively select important dimensions.

📝 arxiv.org/abs/2510.12474
SMEC: Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression
Large language models (LLMs) generate high-dimensional embeddings that capture rich semantic and syntactic information. However, high-dimensional embeddings exacerbate computational complexity and sto...
arxiv.org
reachsumit.com
SMILE: SeMantic Ids Enhanced CoLd Item Representation for Click-through Rate Prediction in E-commerce SEarch

Kuaishou uses RQ-OPQ encoding to enhance cold-start item representations by aligning collaborative signals with semantic information.

📝 arxiv.org/abs/2510.12604
SMILE: SeMantic Ids Enhanced CoLd Item Representation for Click-through Rate Prediction in E-commerce SEarch
With the rise of modern search and recommendation platforms, insufficient collaborative information of cold-start items exacerbates the Matthew effect of existing platform items, challenging platform ...
arxiv.org
reachsumit.com
The Role of Parametric Injection-A Systematic Study of Parametric Retrieval-Augmented Generation

Finds that parametric representations capture only partial semantic information but can enhance document understanding when combined with textual context.

📝 arxiv.org/abs/2510.12668
The Role of Parametric Injection-A Systematic Study of Parametric Retrieval-Augmented Generation
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by retrieving external documents. As an emerging form of RAG, parametric retrieval-augmented generation (PRAG) encodes docume...
arxiv.org
reachsumit.com
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

ByteDance presents a foundation model that supports multifaceted multimodal retrieval and classification by accommodating arbitrary modality inputs, including text, vision, and audio.

📝 arxiv.org/abs/2510.12709
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
Multimodal embedding models aim to yield informative unified representations that empower diverse cross-modal tasks. Despite promising developments in the evolution from CLIP-based dual-tower architec...
arxiv.org
reachsumit.com
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

Apple introduces a multimodal LLM capable of on-demand, multi-turn web searches with dynamic query generation for image, text, and audio search tools.

📝 arxiv.org/abs/2510.12801
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
Multimodal Large Language Models (MLLMs) in real-world applications require access to external knowledge sources and must remain responsive to the dynamic and ever-changing real-world information in o...
arxiv.org
reachsumit.com
Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation

Surveys table question answering approaches in the LLM era, categorizing task setups, modeling strategies, and evaluation methods.

📝 arxiv.org/abs/2510.09671
Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation
Table Question Answering (TQA) aims to answer natural language questions about tabular data, often accompanied by additional contexts such as text passages. The task spans diverse settings, varying in...
arxiv.org
reachsumit.com
HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks

Introduces a framework to evaluate human performance on text embedding benchmarks, revealing that humans rank 4th among models with 77.6% average performance across 16 MTEB tasks.

📝 arxiv.org/abs/2510.10062
HUME: Measuring the Human-Model Performance Gap in Text Embedding Task
Comparing human and model performance offers a valuable perspective for understanding the strengths and limitations of embedding models, highlighting where they succeed and where they fail to capture ...
arxiv.org
reachsumit.com
Domain-Specific Data Generation Framework for RAG Adaptation

Introduces a scalable framework for generating domain-grounded question-answer-context triples to enhance RAG system adaptation across diverse domains and components.

📝 arxiv.org/abs/2510.11217
Domain-Specific Data Generation Framework for RAG Adaptation
Retrieval-Augmented Generation (RAG) combines the language understanding and reasoning power of large language models (LLMs) with external retrieval to enable domain-grounded responses. Effectively ad...
arxiv.org
reachsumit.com
MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest

Pinterest introduces a two-tower architecture that unifies multiple ad domains and optimization tasks using mixture-of-experts, replacing 9 production models.

📝 arxiv.org/abs/2510.09857
MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest
The lightweight ad ranking layer, living after the retrieval stage and before the fine ranker, plays a critical role in the success of a cascaded ad recommendation system. Due to the fact that there a...
arxiv.org
reachsumit.com
LinearRAG: Linear Graph Retrieval Augmented Generation on Large-Scale Corpora

Constructs relation-free hierarchical graphs using lightweight entity extraction, reducing indexing time by over 77% while outperforming existing GraphRAG methods.

📝 arxiv.org/abs/2510.10114
LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora
Retrieval-Augmented Generation (RAG) is widely used to mitigate hallucinations of Large Language Models (LLMs) by leveraging external knowledge. While effective for simple queries, traditional RAG sys...
arxiv.org
reachsumit.com
Lost in the Middle: An Emergent Property from Information Retrieval Demands in LLMs

Demonstrates that lost-in-the-middle behavior in LLMs emerges from adapting to different information retrieval demands during training rather than being a flaw.

📝 arxiv.org/abs/2510.10276
Lost in the Middle: An Emergent Property from Information Retrieval Demands in LLMs
The performance of Large Language Models (LLMs) often degrades when crucial information is in the middle of a long context, a "lost-in-the-middle" phenomenon that mirrors the primacy and recency effec...
arxiv.org
reachsumit.com
Hierarchical LoRA MoE for Efficient CTR Model Scaling

Meta proposes a hierarchical LoRA mixture of experts framework enabling parameter-efficient scaling for CTR prediction, achieving 0.20% AUC improvement with 18.5% FLOPs reduction.

📝 arxiv.org/abs/2510.10432
Hierarchical LoRA MoE for Efficient CTR Model Scaling
Deep models have driven significant advances in click-through rate (CTR) prediction. While vertical scaling via layer stacking improves model expressiveness, the layer-by-layer sequential computation ...
arxiv.org
reachsumit.com
VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering

Introduces a hybrid framework combining multimodal preprocessing, tripartite retrieval, and two-stage domain-to-entity re-ranking

📝 arxiv.org/abs/2510.10828
👨🏽‍💻 github.com/simplew4y/Ve...
GitHub - simplew4y/VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering
An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering - simplew4y/VeritasFi
github.com
reachsumit.com
Decoupled Multimodal Fusion for User Interest Modeling in Click-Through Rate Prediction

Alibaba proposes a framework that enables fine-grained interactions between ID-based and multimodal representations through decoupled target-aware attention.

📝 arxiv.org/abs/2510.11066
Decoupled Multimodal Fusion for User Interest Modeling in Click-Through Rate Prediction
Modern industrial recommendation systems improve recommendation performance by integrating multimodal representations from pre-trained models into ID-based Click-Through Rate (CTR) prediction framewor...
arxiv.org