Lightnews — Scholar-powered news

Nishith Jain

@kingnish.bsky.social

Paper: “ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning” Authors: Researchers from NUS, Stanford, and more
Paper Link: arxiv.org/abs/2510.27492

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

Multimodal reasoning requires iterative coordination between language and vision, yet it remains unclear what constitutes a meaningful interleaved chain of thought. We posit that text and image though...

arxiv.org

November 5, 2025 at 10:00 AM

Nishith Jain

@kingnish.bsky.social

But the real story is emergent behavior.
ThinkMorph shows:
1. Unseen visual manipulations (zoom, crop, inpaint)
2. Autonomous mode switching (text-only when optimal)
3. Enhanced test-time scaling via diverse multimodal trajectories

November 5, 2025 at 9:59 AM

Nishith Jain

@kingnish.bsky.social

Performance? ThinkMorph beats its base model by 34.74% on average. On Spatial Navigation, it jumps from 0.83% to 86.67%. It even rivals models 10x larger, outperforming Qwen2.5-VL-72B and InternVL3.5-38B on multiple benchmarks.

November 5, 2025 at 9:57 AM

Nishith Jain

@kingnish.bsky.social

Training data is key: 24K interleaved reasoning traces across four tasks, Jigsaw Assembly, Spatial Navigation, Visual Search, and Chart Refocus. Each task demands active visual manipulation, not passive image captioning.

November 5, 2025 at 9:52 AM

Nishith Jain

@kingnish.bsky.social

Built on Bagel-7B, ThinkMorph generates sequences of mixed tokens: text and image. It uses delimiter tokens to switch modalities mid-thought, enabling fluid transitions between visual manipulation and textual logic.

November 5, 2025 at 9:51 AM

Nishith Jain

@kingnish.bsky.social

Most multimodal models treat images as sidekicks to text. ThinkMorph rejects that. Inspired by human “think-and-sketch” strategies, it weaves visual and verbal reasoning into a single coherent chain, each modality pushing the other forward.

November 5, 2025 at 9:49 AM

Nishith Jain

@kingnish.bsky.social

If you're designing AI accelerators or deploying LLMs at scale, this study is a must-read. It’s time to rethink the FP-first approach.

Paper: “INT vs FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats”
Link: arxiv.org/pdf/2510.25602

arxiv.org

November 4, 2025 at 7:43 AM

Nishith Jain

@kingnish.bsky.social

The takeaway: Fine-grained INT quantization isn't just viable, it’s superior in many cases. This paper provides a principled framework for choosing formats based on crest factor and granularity.

November 4, 2025 at 7:42 AM

Nishith Jain

@kingnish.bsky.social

Hardware matters too. MXINT8 reduces energy by 37% and area by 21% vs MXFP8. Mixed INT formats (MXINT8 + NVINT4) save 34% area and 25% energy over FP equivalents.

November 4, 2025 at 7:42 AM

Nishith Jain

@kingnish.bsky.social

Even at 4-bit precision, INT can win. With Hadamard rotation to suppress outliers, NVINT4 beat NVFP4 in 99.3% of tensor-wise tests.

November 4, 2025 at 7:42 AM

Nishith Jain

@kingnish.bsky.social

All show INT formats, especially MXINT8 can match or beat FP formats.

MXINT8 outperformed MXFP8 in every test:

Higher average QSNR (40.35 dB vs 31.50 dB)

Nearly lossless training performance

100% win rate in tensor-wise comparisons

November 4, 2025 at 7:42 AM

Nishith Jain

@kingnish.bsky.social

Three evaluation methods:

1. Tensor-wise analysis on Llama-3.1-8B
2. Direct-cast inference on 12 LLMs (0.6B–235B)
3. Low-bit training on 1B & 3B models

November 4, 2025 at 7:41 AM

Nishith Jain

@kingnish.bsky.social

The authors introduce a theoretical framework using Quantization Signal-to-Noise Ratio (QSNR) and crest factor (κ = max(|X|)/σ) to compare INT and FP formats.
INT QSNR drops with high κ, but improves with finer granularity. FP QSNR depends on mantissa width and subnormal values.

November 4, 2025 at 7:41 AM

Nishith Jain

@kingnish.bsky.social

The key insight: granularity matters. At fine-grained (block-wise) levels, local dynamic ranges shrink, reducing outliers. This is where INT formats shine.

November 4, 2025 at 7:41 AM

Nishith Jain

@kingnish.bsky.social

Quantization is essential for scaling LLMs. But activation outliers make low-bit quantization tricky. FP formats are favored for their dynamic range but this study shows INT formats can outperform them under the right conditions.

November 4, 2025 at 7:40 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news