Nishith Jain
banner
kingnish.bsky.social
Nishith Jain
@kingnish.bsky.social
Helping AI to become AGI
Paper: “ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning” Authors: Researchers from NUS, Stanford, and more
Paper Link: arxiv.org/abs/2510.27492
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Multimodal reasoning requires iterative coordination between language and vision, yet it remains unclear what constitutes a meaningful interleaved chain of thought. We posit that text and image though...
arxiv.org
November 5, 2025 at 10:00 AM
But the real story is emergent behavior.
ThinkMorph shows:
1. Unseen visual manipulations (zoom, crop, inpaint)
2. Autonomous mode switching (text-only when optimal)
3. Enhanced test-time scaling via diverse multimodal trajectories
November 5, 2025 at 9:59 AM
Performance? ThinkMorph beats its base model by 34.74% on average. On Spatial Navigation, it jumps from 0.83% to 86.67%. It even rivals models 10x larger, outperforming Qwen2.5-VL-72B and InternVL3.5-38B on multiple benchmarks.
November 5, 2025 at 9:57 AM
Training data is key: 24K interleaved reasoning traces across four tasks, Jigsaw Assembly, Spatial Navigation, Visual Search, and Chart Refocus. Each task demands active visual manipulation, not passive image captioning.
November 5, 2025 at 9:52 AM
Built on Bagel-7B, ThinkMorph generates sequences of mixed tokens: text and image. It uses delimiter tokens to switch modalities mid-thought, enabling fluid transitions between visual manipulation and textual logic.
November 5, 2025 at 9:51 AM
Most multimodal models treat images as sidekicks to text. ThinkMorph rejects that. Inspired by human “think-and-sketch” strategies, it weaves visual and verbal reasoning into a single coherent chain, each modality pushing the other forward.
November 5, 2025 at 9:49 AM
If you're designing AI accelerators or deploying LLMs at scale, this study is a must-read. It’s time to rethink the FP-first approach.

Paper: “INT vs FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats”
Link: arxiv.org/pdf/2510.25602
arxiv.org
November 4, 2025 at 7:43 AM
The takeaway: Fine-grained INT quantization isn't just viable, it’s superior in many cases. This paper provides a principled framework for choosing formats based on crest factor and granularity.
November 4, 2025 at 7:42 AM
Hardware matters too. MXINT8 reduces energy by 37% and area by 21% vs MXFP8. Mixed INT formats (MXINT8 + NVINT4) save 34% area and 25% energy over FP equivalents.
November 4, 2025 at 7:42 AM
Even at 4-bit precision, INT can win. With Hadamard rotation to suppress outliers, NVINT4 beat NVFP4 in 99.3% of tensor-wise tests.
November 4, 2025 at 7:42 AM
All show INT formats, especially MXINT8 can match or beat FP formats.

MXINT8 outperformed MXFP8 in every test:

Higher average QSNR (40.35 dB vs 31.50 dB)

Nearly lossless training performance

100% win rate in tensor-wise comparisons
November 4, 2025 at 7:42 AM
Three evaluation methods:

1. Tensor-wise analysis on Llama-3.1-8B
2. Direct-cast inference on 12 LLMs (0.6B–235B)
3. Low-bit training on 1B & 3B models
November 4, 2025 at 7:41 AM
The authors introduce a theoretical framework using Quantization Signal-to-Noise Ratio (QSNR) and crest factor (κ = max(|X|)/σ) to compare INT and FP formats.
INT QSNR drops with high κ, but improves with finer granularity. FP QSNR depends on mantissa width and subnormal values.
November 4, 2025 at 7:41 AM
The key insight: granularity matters. At fine-grained (block-wise) levels, local dynamic ranges shrink, reducing outliers. This is where INT formats shine.
November 4, 2025 at 7:41 AM
Quantization is essential for scaling LLMs. But activation outliers make low-bit quantization tricky. FP formats are favored for their dynamic range but this study shows INT formats can outperform them under the right conditions.
November 4, 2025 at 7:40 AM