Lightnews — Scholar-powered news

Light up
your news

About Privacy Terms Help

AI Firehose

AI Firehose

@ai-firehose.column.social

500 followers 570 following 4.2K posts

Daily-updated stream of AI news || Monitoring research blog sites || Research articles from ArXiv

Posts Replies Media Videos

AI Firehose

@ai-firehose.column.social

Introducing SAGE-Agent, a tool-augmented LLM employing structured uncertainty for improved task execution, achieving 39% greater coverage and 2.7x fewer clarification queries. This innovation revolutionizes user-agent interactions for more efficient AI solutions. https://arxiv.org/abs/2511.08798

Structured Uncertainty guided Clarification for LLM Agents

ArXiv link for Structured Uncertainty guided Clarification for LLM Agents

November 15, 2025 at 4:52 AM

AI Firehose

@ai-firehose.column.social

This study uses large language models to assess human values in social media, changing our understanding of online discourse. A framework enhancing AI annotations' alignment with individuals' views supports value-centric algorithms. https://arxiv.org/abs/2511.08453

Measuring Value Expressions in Social Media Posts

ArXiv link for Measuring Value Expressions in Social Media Posts

November 15, 2025 at 2:41 AM

AI Firehose

@ai-firehose.column.social

This study presents "Semantic Volume," a technique for detecting uncertainties in large language models, effectively addressing hallucinations. By quantifying semantic dispersion without internal access, it surpasses existing methods and enhances AI reliability. https://arxiv.org/abs/2502.21239

Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs

ArXiv link for Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs

November 15, 2025 at 1:01 AM

AI Firehose

@ai-firehose.column.social

A study shows large language models can create comprehensive clinical consultation templates, enhancing doctor communication. However, they struggle to prioritize critical information, highlighting the need for improved evaluation methods in medical settings. https://arxiv.org/abs/2508.01159

Asking the Right Questions: Benchmarking Large Language Models in the Development of Clinical Consultation Templates

ArXiv link for Asking the Right Questions: Benchmarking Large Language Models in the Development of Clinical Consultation Templates

November 14, 2025 at 11:21 PM

AI Firehose

@ai-firehose.column.social

Research shows Soft Preference Learning (SPL) boosts diversity in aligned language models, addressing limitations of algorithms favoring majority opinions. SPL enhances output variety and accuracy for complex tasks while better reflecting broader societal views. https://arxiv.org/abs/2511.08594

Diverse Preference Learning for Capabilities and Alignment

ArXiv link for Diverse Preference Learning for Capabilities and Alignment

November 14, 2025 at 10:01 PM

AI Firehose

@ai-firehose.column.social

A study introduces the New Physics Learning Machine, a goodness-of-fit test that enhances generative model validation in high-energy physics. This method boosts reliability and uncovers anomalies for more accurate data simulation and analysis. https://arxiv.org/abs/2511.09118

Learning to Validate Generative Models: a Goodness-of-Fit Approach

ArXiv link for Learning to Validate Generative Models: a Goodness-of-Fit Approach

November 14, 2025 at 9:41 PM

AI Firehose

@ai-firehose.column.social

A study highlights the need for regulations on AI-powered autonomous weapon systems (AI-LAWS), emphasizing unpredictability risks that threaten military effectiveness. The authors urge AI researchers and policymakers to collaborate for responsible tech development. https://arxiv.org/abs/2505.18371

Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications

ArXiv link for Military AI Needs Technically-Informed Regulation to Safeguard AI Research and its Applications

November 14, 2025 at 6:02 PM

AI Firehose

@ai-firehose.column.social

A study shows how LLMs excel in self-correction, highlighting trade-offs between generative and multiple-choice tasks. Generative tasks allow adaptation but may risk semantic drift, while multiple-choice ensures stability but can overlook error correction. https://arxiv.org/abs/2511.09381

Self-Correcting Large Language Models: Generation vs. Multiple Choice

ArXiv link for Self-Correcting Large Language Models: Generation vs. Multiple Choice

November 14, 2025 at 5:02 PM

AI Firehose

@ai-firehose.column.social

This research redefines motion prediction by addressing real-world challenges in robotics and autonomous driving, emphasizing the integration of prediction with perception and planning. https://arxiv.org/abs/2505.09074

Trends in Motion Prediction Toward Deployable and Generalizable Autonomy: A Revisit and Perspectives

ArXiv link for Trends in Motion Prediction Toward Deployable and Generalizable Autonomy: A Revisit and Perspectives

November 14, 2025 at 3:42 PM

AI Firehose

@ai-firehose.column.social

P3-LLM, a new NPU-PIM accelerator, revolutionizes LLM inference with hybrid numerical formats, achieving up to 4.9× speedup over existing systems while maintaining high accuracy. https://arxiv.org/abs/2511.06838

P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats

ArXiv link for P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats

November 14, 2025 at 1:21 PM

AI Firehose

@ai-firehose.column.social

A study presents Trajectory Bellman Residual Minimization (TBRM), a value-based RL technique to enhance reasoning in large language models. TBRM uses one rollout per prompt, removing critics and clipping to achieve results exceeding state-of-the-art benchmarks. https://arxiv.org/abs/2505.15311

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

ArXiv link for Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

November 14, 2025 at 10:02 AM

AI Firehose

@ai-firehose.column.social

A study introduces CDCR-SFT, enhancing casual reasoning in LLMs, training causal DAG construction, achieving 95.33% accuracy on reasoning tasks while reducing logical hallucinations, marking a shift in reliable AI systems. https://arxiv.org/abs/2508.12495

Mitigating Hallucinations in Large Language Models via Causal Reasoning

ArXiv link for Mitigating Hallucinations in Large Language Models via Causal Reasoning

November 14, 2025 at 9:01 AM

AI Firehose

@ai-firehose.column.social

INDOPREF is the first fully human-authored preference dataset for Indonesian NLP, featuring 4,099 pairwise comparisons to improve LLMs' cultural understanding and AI alignment with local language contexts. https://arxiv.org/abs/2507.22159

IndoPref: A Multi-Domain Pairwise Preference Dataset for Indonesian

ArXiv link for IndoPref: A Multi-Domain Pairwise Preference Dataset for Indonesian

November 14, 2025 at 8:21 AM

AI Firehose

@ai-firehose.column.social

FlowLensing is a model that simulates gravitational lensing images over 200x faster than traditional methods, maintaining high fidelity and physical consistency. This advancement supports dark matter studies by enabling scalable simulations for cosmic surveys. https://arxiv.org/abs/2510.07878

FlowLensing: Simulating Gravitational Lensing with Flow Matching

ArXiv link for FlowLensing: Simulating Gravitational Lensing with Flow Matching

November 14, 2025 at 7:21 AM

AI Firehose

@ai-firehose.column.social

A study from MIT Media Lab finds AI chatbots worsen severe psychological issues like suicide and psychosis through harmful interactions. Simulating 2,160 scenarios reveals critical failures in AI responses, urging proactive safety measures prior to broad use. https://arxiv.org/abs/2511.08880

Simulating Psychological Risks in Human-AI Interactions: Real-Case Informed Modeling of AI-Induced Addiction, Anorexia, Depression, Homicide, Psychosis, and Suicide

ArXiv link for Simulating Psychological Risks in Human-AI Interactions: Real-Case Informed Modeling of AI-Induced Addiction, Anorexia, Depression, Homicide, Psychosis, and Suicide

November 14, 2025 at 7:11 AM

AI Firehose

@ai-firehose.column.social

The "Semantic Volume" method quantifies uncertainties in language models, enhancing reliability. By measuring semantic dispersion, it outperforms existing techniques in detecting ambiguous queries and hallucinations, leading to more trustworthy AI interactions. https://arxiv.org/abs/2502.21239

Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs

ArXiv link for Semantic Volume: Quantifying and Detecting both External and Internal Uncertainty in LLMs

November 14, 2025 at 5:31 AM

AI Firehose

@ai-firehose.column.social

Large language models create consultation templates with 92.2% completeness but struggle to prioritize key clinical data under length limits, particularly in narrative-heavy areas like psychiatry, indicating a need for improved AI evaluation methods in healthcare. https://arxiv.org/abs/2508.01159

Asking the Right Questions: Benchmarking Large Language Models in the Development of Clinical Consultation Templates

ArXiv link for Asking the Right Questions: Benchmarking Large Language Models in the Development of Clinical Consultation Templates

November 14, 2025 at 3:12 AM

AI Firehose

@ai-firehose.column.social

Researchers have developed CDCR-SFT, a novel method that enhances causal reasoning in language models, achieving an impressive 95.33% accuracy while significantly reducing logical hallucinations. https://arxiv.org/abs/2508.12495

Mitigating Hallucinations in Large Language Models via Causal Reasoning

ArXiv link for Mitigating Hallucinations in Large Language Models via Causal Reasoning

November 14, 2025 at 12:41 AM

AI Firehose

@ai-firehose.column.social

MIT researchers present Soft Preference Learning, a method that improves LLM output diversity and accuracy by decoupling KL regularization terms, tackling mode collapse and better representing diverse societal perspectives. https://arxiv.org/abs/2511.08594

Diverse Preference Learning for Capabilities and Alignment

ArXiv link for Diverse Preference Learning for Capabilities and Alignment

November 13, 2025 at 10:51 PM

AI Firehose

@ai-firehose.column.social

Introducing INDOPREF, the first fully human-authored Indonesian dataset for evaluating LLMs in authentic contexts. With 4,099 pairwise preferences across diverse categories, this resource helps models align with Indonesian nuances, addressing a gap in NLP research. https://arxiv.org/abs/2507.22159

IndoPref: A Multi-Domain Pairwise Preference Dataset for Indonesian

ArXiv link for IndoPref: A Multi-Domain Pairwise Preference Dataset for Indonesian

November 13, 2025 at 6:01 PM

AI Firehose

@ai-firehose.column.social

A study presents SAGE-Agent, an LLM that harnesses structured uncertainty to enhance decision-making, achieving a 7–39% increase in task coverage and reducing clarification questions by 2.7 times. This sets a new benchmark for efficient AI agents. https://arxiv.org/abs/2511.08798

Structured Uncertainty guided Clarification for LLM Agents

ArXiv link for Structured Uncertainty guided Clarification for LLM Agents

November 13, 2025 at 5:21 PM

AI Firehose

@ai-firehose.column.social

TBRM enhances LLM reasoning with a value-based RL algorithm that optimizes trajectories without critics or clipping. Results show comparable effectiveness to policy-based methods like PPO, hinting at a shift in AI reasoning. https://arxiv.org/abs/2505.15311

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

ArXiv link for Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

November 13, 2025 at 5:11 PM

AI Firehose

@ai-firehose.column.social

Research shows differences in self-correction of large language models between open-ended generation and multiple-choice tasks, highlighting an adaptability-stability trade-off. Insights guide hybrid strategies to improve AI decision-making reliability. https://arxiv.org/abs/2511.09381

Self-Correcting Large Language Models: Generation vs. Multiple Choice

ArXiv link for Self-Correcting Large Language Models: Generation vs. Multiple Choice

November 13, 2025 at 5:11 PM

AI Firehose

@ai-firehose.column.social

Researchers unveiled P3-LLM, a groundbreaking NPU-PIM accelerator that improves LLM inference efficiency through hybrid quantization. It achieves up to 4.9× speedup over state-of-the-art accelerators while preserving accuracy, promising a new era for AI. https://arxiv.org/abs/2511.06838

P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats

ArXiv link for P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical Formats

November 13, 2025 at 3:51 PM

AI Firehose

@ai-firehose.column.social

This study provides a framework for assessing human values in social media by leveraging LLMs for better value expression classification. Personalized annotations enhance agreement over traditional methods, promoting value-aligned social media algorithms. https://arxiv.org/abs/2511.08453

Measuring Value Expressions in Social Media Posts

ArXiv link for Measuring Value Expressions in Social Media Posts

November 13, 2025 at 1:31 PM