Lightnews — Scholar-powered news

desunit

@desunit.bsky.social

150 followers 390 following 1.2K posts

Entrepreneur

✨ http://rwiz.ai - Handling reviews with AI
🎹 http://pianocompanion.info - Chords dictionary app with 1M+ downloads.
🕹️ http://chordiq.info - Learn chords.
📝 desunit.com - my blog

Posts Replies Media Videos

desunit

@desunit.bsky.social

Site: matharena.ai/arxivmath/

MathArena.ai

MathArena: Evaluating LLMs on Uncontaminated Math Benchmarks

matharena.ai

February 11, 2026 at 7:11 PM

desunit

@desunit.bsky.social

If a system can correctly answer half of brand-new research math questions, sourced from papers published weeks ago, the bar has moved. A lot.

What happens when reasoning keeps improving, but humans keep arguing using 2022 mental models?

... just saying.

February 11, 2026 at 7:11 PM

desunit

@desunit.bsky.social

Producing a final answer is much easier than proving it rigorously.

But the old argument "Just a parrot, repeating old stuff on loop" -
is getting weaker every month.

February 11, 2026 at 7:11 PM

desunit

@desunit.bsky.social

> require understanding new results, not recalling textbooks

Yet people still say: AI can’t handle unknown equations ..... AI isn’t creative ....

This is basically checkmate.

This does not mean AI can write 60% of math papers.

February 11, 2026 at 7:11 PM

desunit

@desunit.bsky.social

> final answers only (no "almost right" reasoning)

The results?

‼️ Top models get ~50–60% correct answers ‼️

GPT-5.2 - 60%.
Gemini-3-Pro is right behind

These are problems that:

> an average human cannot solve at all
> many math grads would struggle with

February 11, 2026 at 7:11 PM

desunit

@desunit.bsky.social

I just stumbled on ArXivMath - a fresh benchmark that evaluates LLMs on research-level mathematical problems taken from recent ArXiv papers (you can say - from the last month). That means:

> minimal training contamination
> no memorization of a static benchmark

February 11, 2026 at 7:11 PM

desunit

@desunit.bsky.social

Source: restofworld.org/2026/china-...

China is running the EV playbook on humanoid robots — and it’s working

Chinese companies control 90% of the humanoid robot market, dominating the technology that will reshape manufacturing and labor. The West is barely competing.

restofworld.org

February 9, 2026 at 7:12 PM

desunit

@desunit.bsky.social

Like it or not but the future isn’t won by perfect demos/clean decks.
It’s won by whoever ships early, floods the market, and improves in public.

Waiting for v1.0 is how you end up losing to someone who ships v0.1 at scale... a really large scale.

February 9, 2026 at 7:12 PM

desunit

@desunit.bsky.social

Robots today are awkward/limited/silly

But who cares?!

Volume creates learning loops →
Learning loops create cost drops →
Cost drops create adoption →
Adoption creates dominance

Exactly what we've seen with EVs. Same logic shows up in AI adoption.

February 9, 2026 at 7:12 PM

desunit

@desunit.bsky.social

Source: x.com/nicbstme/st...

February 6, 2026 at 7:16 PM

desunit

@desunit.bsky.social

Talking to an LLM is often a better experience than learning another complex tool.

February 6, 2026 at 7:16 PM

desunit

@desunit.bsky.social

- Interface-based moats are dying
- Proprietary data still matters
- Whoever owns the chat interface becomes the new aggregator

Yes, it’s a painful, especially if you’ve spent years building beautiful UX but the reality is simple:

February 6, 2026 at 7:16 PM

desunit

@desunit.bsky.social

That’s the scary part. You don't have brand visibility/UX differentiation/no workflow lock-in. Pricing power collapses unless the data is truly proprietary. If your data can be licensed, scraped, or replicated, there’s no moat left - just commodity competition.

Takeaways:

February 6, 2026 at 7:16 PM

desunit

@desunit.bsky.social

You don’t open tools, learn workflows, or even know which vendor is used. You just ask: Give me XXX, analyze YYY, run ZZZ

When the interface disappears, all that’s left is API vs API.

February 6, 2026 at 7:16 PM

desunit

@desunit.bsky.social

In Web 2.0, aggregators like Google commoditized discovery. At the same time suppliers still owned two things:
- interface
- data

That’s why vertical software could charge premium prices.

but it looks like LLMs change that.

The LLM chat becomes the interface.

February 6, 2026 at 7:16 PM

desunit

@desunit.bsky.social

For years, software companies didn’t win because of data - they won because of interfaces: complex workflows/plugins/exports/shortcuts. I know at least several examples where that friction created massive switching costs. Basically, their interface was the moat.

February 6, 2026 at 7:16 PM

desunit

@desunit.bsky.social

The video of the app:

February 4, 2026 at 7:18 PM

desunit

@desunit.bsky.social

5/ But now it’s a skill you actually need; otherwise, you’ll just waste time watching the LLM think and craft code.

LLMs are slow. Humans shouldn’t be idle while they think.
If you have experience and can keep context in your head, AI turns you into a force multiplier.

February 4, 2026 at 7:18 PM

desunit

@desunit.bsky.social

4/ The interesting part is that while the LLM was implementing features we agreed on and planned, I was switching between several other projects. Reviewing. Thinking. Deciding what’s next.

I always believed context switching is bad. And it probably still is.

February 4, 2026 at 7:18 PM

desunit

@desunit.bsky.social

3/ > Ingest incoming house invoices and split them across apartments
> Manage parking spots
…and a lot more

Could this have been done this fast a couple of years ago? I doubt it.
> Not with this scope.
> Not just me.
> And definitely not while juggling other projects.

February 4, 2026 at 7:18 PM

desunit

@desunit.bsky.social

2/ To better understand the amount of work, here's what the system does:
> Send invoices
> Collect cold water meter readings
> Ping tenants who forgot to submit them

February 4, 2026 at 7:18 PM

desunit

@desunit.bsky.social

1/ I love my wife, and I couldn’t watch her waste time on things that can be easily automated. The math is simple - 2 days per house turns into 24 days a year.

You know how it works - happy wife, happy family.

February 4, 2026 at 7:18 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news