Large Language Models Report Subjective Experience Under Self-Referential Processing
Large language models sometimes produce structured, first-person descriptions that explicitly reference awareness or subjective experience. To better understand this behavior, we investigate one theor...
arxiv.org
November 2, 2025 at 12:07 AM
Everybody can reply
BADAS: Context Aware Collision Prediction Using Real-World Dashcam Data
Existing collision prediction methods often fail to distinguish between ego-vehicle threats and random accidents not involving the ego vehicle, leading to excessive false alerts in real-world deployme...
arxiv.org
November 2, 2025 at 12:06 AM
Everybody can reply
Reasoning Models Reason Well, Until They Don't
Large language models (LLMs) have shown significant progress in reasoning tasks. However, recent studies show that transformers and LLMs fail catastrophically once reasoning problems exceed modest com...
arxiv.org
November 1, 2025 at 12:06 AM
Everybody can reply
Scaling Latent Reasoning via Looped Language Models
Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and...
arxiv.org
November 1, 2025 at 12:05 AM
Everybody can reply
Language Models are Injective and Hence Invertible
Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the in...
arxiv.org
October 31, 2025 at 12:06 AM
Everybody can reply
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning
Test-time scaling seeks to improve the reasoning performance of large language models (LLMs) by adding computational resources. A prevalent approach within the field is sampling-based test-time scalin...
arxiv.org
October 30, 2025 at 12:06 AM
Everybody can reply
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Large language models leverage internet-scale text data, yet embodied AI remains constrained by the prohibitive costs of physical trajectory collection. Desktop environments -- particularly gaming -- ...
arxiv.org
October 30, 2025 at 12:06 AM
Everybody can reply
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Humans learn abstract concepts through multisensory synergy, and once formed, such representations can often be recalled from a single modality. Inspired by this principle, we introduce Concerto, a mi...
arxiv.org
October 29, 2025 at 12:06 AM
Everybody can reply
Do LLMs "Feel"? Emotion Circuits Discovery and Control
As the demand for emotional intelligence in large language models (LLMs) grows, a key challenge lies in understanding the internal mechanisms that give rise to emotional expression and in controlling ...
arxiv.org
October 29, 2025 at 12:06 AM
Everybody can reply
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
In recent years, large-scale generative models for visual content (\textit{e.g.,} images, videos, and 3D objects/scenes) have made remarkable progress. However, training large-scale video generation m...
arxiv.org
October 29, 2025 at 12:06 AM
Everybody can reply
Efficient Long-context Language Model Training by Core Attention Disaggregation
We present core attention disaggregation (CAD), a technique that improves long-context large language model training by decoupling the core attention computation, softmax(QK^T)V, from the rest of the ...
arxiv.org
October 29, 2025 at 12:05 AM
Everybody can reply
A Definition of AGI
The lack of a concrete definition for Artificial General Intelligence (AGI) obscures the gap between today's specialized AI and human-level cognition. This paper introduces a quantifiable framework to...
arxiv.org
October 28, 2025 at 12:06 AM
Everybody can reply
1 reposts
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls
Language models are increasingly capable, yet still fail at a seemingly simple task of multi-digit multiplication. In this work, we study why, by reverse-engineering a model that successfully learns m...
arxiv.org
October 26, 2025 at 12:06 AM
Everybody can reply
1 likes
Echoes of Humanity: Exploring the Perceived Humanness of AI Music
Recent advances in AI music (AIM) generation services are currently transforming the music industry. Given these advances, understanding how humans perceive AIM is crucial both to educate users on ide...
arxiv.org
October 25, 2025 at 12:06 AM
Everybody can reply
Humanoid Goalkeeper: Learning from Position Conditioned Task-Motion Constraints
We present a reinforcement learning framework for autonomous goalkeeping with humanoid robots in real-world scenarios. While prior work has demonstrated similar capabilities on quadrupedal platforms, ...
arxiv.org
October 25, 2025 at 12:06 AM
Everybody can reply
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
State-of-the-art text-to-video models excel at generating isolated clips but fall short of creating the coherent, multi-shot narratives, which are the essence of storytelling. We bridge this "narrativ...
arxiv.org
October 25, 2025 at 12:06 AM
Everybody can reply
#TIL that #AlphaXiv, has indexed the datasets mentioned in the #AI papers in #ArXiv that is has indexed:
www.alphaxiv.org?datasets=true
#AI #data
www.alphaxiv.org?datasets=true
#AI #data
alphaXiv
Discuss, discover, and read arXiv papers.
www.alphaxiv.org
October 24, 2025 at 10:21 PM
Everybody can reply
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
Widespread LLM adoption has introduced characteristic repetitive phraseology, termed "slop," which degrades output quality and makes AI-generated text immediately recognizable. We present Antislop, a ...
arxiv.org
October 24, 2025 at 12:07 AM
Everybody can reply
TRM reproduction report
okay, i’m starting to believe TRM is legit. The 5M does seem to hold up on almost all of its claims
crazy.
repro report: github.com/alphaXiv/Tin...
code: github.com/alphaXiv/Tin...
(weights in readme)
okay, i’m starting to believe TRM is legit. The 5M does seem to hold up on almost all of its claims
crazy.
repro report: github.com/alphaXiv/Tin...
code: github.com/alphaXiv/Tin...
(weights in readme)
October 23, 2025 at 11:38 AM
Everybody can reply
2 reposts
34 likes
8 saves
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
Instruction-based image editing has achieved remarkable progress; however, models solely trained via supervised fine-tuning often overfit to annotated patterns, hindering their ability to explore and ...
arxiv.org
October 23, 2025 at 12:06 AM
Everybody can reply
Glyph: Scaling Context Windows via Visual-Text Compression
Large language models (LLMs) increasingly rely on long-context modeling for tasks such as document understanding, code analysis, and multi-step reasoning. However, scaling context windows to the milli...
arxiv.org
October 23, 2025 at 12:06 AM
Everybody can reply
Links: abs, pdf Search: Bluesky, Twitter, Reddit, Hacker News, Hugging Face, alphaXiv
Interest | Match | Feed
Interest | Match | Feed
Origin
bsky.app
October 22, 2025 at 12:39 AM
Everybody can reply
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Instruction-based video editing promises to democratize content creation, yet its progress is severely hampered by the scarcity of large-scale, high-quality training data. We introduce Ditto, a holist...
arxiv.org
October 22, 2025 at 12:07 AM
Everybody can reply
VISTA: A Test-Time Self-Improving Video Generation Agent
Despite rapid advances in text-to-video synthesis, generated video quality remains critically dependent on precise user prompts. Existing test-time optimization methods, successful in other domains, s...
arxiv.org
October 22, 2025 at 12:07 AM
Everybody can reply