A new hybrid mamba2/attention LLM from NVIDIA that beats Qwen3-30B-A3B (same size & shape)
Notes:
* 1M context, with incredible recall past 256K
* New open datasets
* 10 open source RL environments
Overall this is a huge win for neolabs
huggingface.co/nvidia/NVIDI...
A new hybrid mamba2/attention LLM from NVIDIA that beats Qwen3-30B-A3B (same size & shape)
Notes:
* 1M context, with incredible recall past 256K
* New open datasets
* 10 open source RL environments
Overall this is a huge win for neolabs
huggingface.co/nvidia/NVIDI...
Main Link | Techmeme Permalink
Main Link | Techmeme Permalink
Placed 7, beating Claude Code and most Codex variations. LOL.
Placed 7, beating Claude Code and most Codex variations. LOL.
The loop continues - Claude Opus 4.5 dropped.
First model to break 80% on real-world software engineering (SWE-bench Verified).
But the interesting part isn't just the benchmark, it's also what Anthropic is doing to make their smartest model usable day-to-day.
#Claude #Anthropic #Opus #GenAI
The loop continues - Claude Opus 4.5 dropped.
First model to break 80% on real-world software engineering (SWE-bench Verified).
But the interesting part isn't just the benchmark, it's also what Anthropic is doing to make their smartest model usable day-to-day.
#Claude #Anthropic #Opus #GenAI
Now 1/3rd the cost, and SOTA in programming
Like Gemini 3 Pro, people note that it can see a lot deeper into tough problems. That big model smell..
www.anthropic.com/news/claude-...
Now 1/3rd the cost, and SOTA in programming
Like Gemini 3 Pro, people note that it can see a lot deeper into tough problems. That big model smell..
www.anthropic.com/news/claude-...
- they compared against Gemini 3 👍
- they showed a decent number of benchmarks
- It *actually* does well compared against Gemini
- they compared against Gemini 3 👍
- they showed a decent number of benchmarks
- It *actually* does well compared against Gemini
put differently: you can get Opus high now!
put differently: you can get Opus high now!
better faster stronger
better faster stronger
Sonnet still ahead on SWE-bench though, while Gemini takes TerminalBench
Nice to see models getting better at different things
Sonnet still ahead on SWE-bench though, while Gemini takes TerminalBench
Nice to see models getting better at different things
📈 1487 Elo on WebDev Arena, 76.2% on SWE-bench Verified
🛠️ Try out Google Antigravity: A new agentic IDE with direct access to the terminal, editor, and browser to build and validate code.
blog.google/products/gem...
📈 1487 Elo on WebDev Arena, 76.2% on SWE-bench Verified
🛠️ Try out Google Antigravity: A new agentic IDE with direct access to the terminal, editor, and browser to build and validate code.
blog.google/products/gem...
better late than never, i guess
better late than never, i guess
Almost same performance as GPT-5-codex on high, but 4x faster and without pesky things like warm personality
www.neowin.net/amp/openai-i...
Almost same performance as GPT-5-codex on high, but 4x faster and without pesky things like warm personality
www.neowin.net/amp/openai-i...
For instance, GPT-5 underperforms in GPQA Diamond but overperforms in VPCT.
For instance, GPT-5 underperforms in GPQA Diamond but overperforms in VPCT.
Paper: arxiv.org/abs/2510.21614
Repo: github.com/metauto-ai/HGM
Paper: arxiv.org/abs/2510.21614
Repo: github.com/metauto-ai/HGM
This model has been shaking the benchmarks last week, now that it’s open we see that it’s 230B-A10B and dueling (arguably beating) Sonnet 4.5 at 8% of the cost
github.com/MiniMax-AI/M...
This model has been shaking the benchmarks last week, now that it’s open we see that it’s 230B-A10B and dueling (arguably beating) Sonnet 4.5 at 8% of the cost
github.com/MiniMax-AI/M...