Nishith Jain
banner
kingnish.bsky.social
Nishith Jain
@kingnish.bsky.social
Helping AI to become AGI
But the real story is emergent behavior.
ThinkMorph shows:
1. Unseen visual manipulations (zoom, crop, inpaint)
2. Autonomous mode switching (text-only when optimal)
3. Enhanced test-time scaling via diverse multimodal trajectories
November 5, 2025 at 9:59 AM
Performance? ThinkMorph beats its base model by 34.74% on average. On Spatial Navigation, it jumps from 0.83% to 86.67%. It even rivals models 10x larger, outperforming Qwen2.5-VL-72B and InternVL3.5-38B on multiple benchmarks.
November 5, 2025 at 9:57 AM
Training data is key: 24K interleaved reasoning traces across four tasks, Jigsaw Assembly, Spatial Navigation, Visual Search, and Chart Refocus. Each task demands active visual manipulation, not passive image captioning.
November 5, 2025 at 9:52 AM
New paper: ThinkMorph introduces a unified multimodal model that interleaves text and image reasoning steps setting a new benchmark for vision-centric tasks. It’s not just another CoT model. It’s a rethink of how language and vision should collaborate.
November 5, 2025 at 9:48 AM
The authors introduce a theoretical framework using Quantization Signal-to-Noise Ratio (QSNR) and crest factor (κ = max(|X|)/σ) to compare INT and FP formats.
INT QSNR drops with high κ, but improves with finer granularity. FP QSNR depends on mantissa width and subnormal values.
November 4, 2025 at 7:41 AM
The key insight: granularity matters. At fine-grained (block-wise) levels, local dynamic ranges shrink, reducing outliers. This is where INT formats shine.
November 4, 2025 at 7:41 AM
The AI hardware industry is betting big on low-precision floating-point formats like FP8. But what if integer formats are actually better for both accuracy and efficiency?

A new paper challenges the FP-first narrative🧵
November 4, 2025 at 7:40 AM