Lightnews — Scholar-powered news

Reposted by Amadeus

sakanaai.bsky.social

@sakanaai.bsky.social

“Continuous Thought Machines”

Blog → sakana.ai/ctm

Modern AI is powerful, but it's still distinct from human-like flexible intelligence. We believe neural timing is key. Our Continuous Thought Machine is built from the ground up to use neural dynamics as a powerful representation for intelligence.

May 12, 2025 at 2:33 AM

Reposted by Amadeus

Adina Yakup

@adinayakup.bsky.social

Kimi-Audio 🚀🎧 an OPEN audio foundation model released by Moonshot AI

huggingface.co/moonshotai/K...

✨ 7B
✨ 13M+ hours of pretraining data
✨ Novel hybrid input architecture
✨ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)

moonshotai/Kimi-Audio-7B-Instruct · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

April 28, 2025 at 7:34 AM

Reposted by Amadeus

Tim Kellogg

@timkellogg.me

🚨New DeepSeek Model Incoming🚨

but first they release the paper describing generative reward modeling (GRM) via Self-Principled Critique Tuning (SPCT)

looking forward to DeepSeek-GRM!

arxiv.org/abs/2504.02495

A line chart titled “Figure 1: Inference-time scaling performance with different RMs on all tested RM benchmarks” shows performance on the y-axis (ranging from 66.5 to 72.5) and k: #sampled rewards (logscale) on the x-axis, with values from 1 to 32.

Key observations:
• DeepSeek-GRM-27B (MetaRM@k) (Ours) is the top performer, shown with a red line and star markers, rising steeply and leveling near 72.5.
• DeepSeek-GRM-27B (Voting@k) (Ours) follows, in blue with stars, peaking slightly above 70.5.
• GPT-4o (Greedy) is shown as a gray dashed line, sitting just under 71.
• Other models, shown in orange, green, brown, and gray lines (scalar or voting methods), plateau between ~66.5 and ~68.5.
• LLM-as-a-Judge w/ TokenProb, Skywork-Reward-Gemma-2-27B, and DeepSeek-BTRM-27B are among these lower-performing models.

Caption summary: The plot shows how performance scales with the number of reward samples at inference time. Results are up to 8 samples, with some (DeepSeek models) extrapolated to 32. Models in non-italic font use Gemma-2-27B as their base.

April 4, 2025 at 10:45 AM

Reposted by Amadeus

Sung Kim

@sungkim.bsky.social

You’ve probably heard about how AI/LLMs can solve Math Olympiad problems ( deepmind.google/discover/blo... ).

So naturally, some people put it to the test — hours after the 2025 US Math Olympiad problems were released.

The result: They all sucked!

March 31, 2025 at 8:33 PM

Reposted by Amadeus

Sung Kim

@sungkim.bsky.social

Kimi 1.6 is coming ...

Source: livecodebench.github.io/leaderboard....
Chat: kimi.ai

February 27, 2025 at 10:53 PM

Reposted by Amadeus

Sung Kim

@sungkim.bsky.social

This is new - Moonshot AI (i.e., kimi.ai) released the two open-weigh models.

Moonlight: 3B/16B MoE model trained with Muon on 5.7T tokens, advancing the Pareto frontier with better performance at fewer FLOPs.

huggingface.co/moonshotai

February 22, 2025 at 8:20 PM

Reposted by Amadeus

Sung Kim

@sungkim.bsky.social

The paper below describes Huawei's cloud AI platform designed to efficiently serve LLMs.

It uses four major design components: serverless abstraction and infrastructure, serving engine, scheduling algorithms, and scaling optimizations.

February 18, 2025 at 1:43 PM

Reposted by Amadeus

Nicolay Gerold

@nicolaygerold.com

RAG is dead. Long live RAG.

LLMs suck at long context.

This paper shows what I have seen in most deployments.

With longer contexts, performance degrades.

February 13, 2025 at 6:11 AM

Reposted by Amadeus

Costa Huang

@vwxyzjn.bsky.social

🔥 allenai/Llama-3.1-Tulu-3-8B (trained with PPO) -> allenai/Llama-3.1-Tulu-3.1-8B (trained with GRPO)

We are happy to "quietly" release our latest GRPO-trained Tulu 3.1 model, which is considerably better in MATH and GSM8K!

February 12, 2025 at 5:33 PM

Reposted by Amadeus

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

can we scale small, open LMs to o1 level? Using classical probabilistic inference methods, YES!

Particle filtering approach to Improved inference w/o any training!
Check out probabilistic-inference-scaling.github.io

By Aisha Puri et al📈🤖
Joint MIT-CSAIL & RedHat

Probabilistic Inference Scaling

probabilistic-inference-scaling.github.io

February 7, 2025 at 8:05 PM

Reposted by Amadeus

Sung Kim

@sungkim.bsky.social

Post training an LLM for reasoning with GRPO in TRL by @sergiopaniego.bsky.social

A guide to post-training a LLM using GRPO. It's particularly effective for scaling test-time compute for extended reasoning, making it an ideal approach for solving complex tasks, such as mathematical problem-solving

February 7, 2025 at 5:15 AM

Reposted by Amadeus

Marcin Jasiukowicz 🅨

@yasiu.pl

Dzisiaj spotkanie z załogą misji Ax-4 w Centrum Nauki Kopernik. Trzymamy kciuki za lot Sławosza Uznańskiego!

z @polsa.studenci @astro_peggy @astro_slawosz @tibor_to_orbit

February 5, 2025 at 9:57 PM

Reposted by Amadeus

Simon Willison

@simon.fedi.simonwillison.net.ap.brid.gy

Tiny TIL: I just figured out how to run pytest with a different Python version against my pyproject.toml/setup.py projects using uv run

uv run --python 3.12 --with '.[test]' pytest

https://til.simonwillison.net/pytest/pytest-uv

Running pytest against a specific Python version with uv run

While working on this issue I figured out a neat pattern for running the tests for my project locally against a specific Python version using uv run :

til.simonwillison.net

February 4, 2025 at 10:59 PM

Reposted by Amadeus

Tim Kellogg

@timkellogg.me

R1-V: teaching a VLM how to count with RL with verifiable rewards

Starting with Qwen2VL-Instruct-2B, they spent $3 on compute and got it to outperform the 72B

github.com/Deep-Agent/R...

GitHub - Deep-Agent/R1-V: Witness the aha moment of VLM with less than $3.

Witness the aha moment of VLM with less than $3. Contribute to Deep-Agent/R1-V development by creating an account on GitHub.

github.com

February 3, 2025 at 12:27 PM

Reposted by Amadeus

Sung Kim

@sungkim.bsky.social

Alibaba Qwen2.5-1M ! 💥 Now supporting a 1 MILLION TOKEN CONTEXT LENGTH 🔥

📄 Blog: qwenlm.github.io/blog/qwen2.5...

January 26, 2025 at 5:56 PM

Reposted by Amadeus

Tim Kellogg

@timkellogg.me

Explainer: What's R1 and Everything Else

This is an attempt to consolidate the dizzying rate of AI developments since Christmas. If you're into AI but not deep enough, this should get you oriented again.

timkellogg.me/blog/2025/01...

The image depicts a monumental statue of Buddha, emphasizing serenity and grandeur. The statue's intricate design captures traditional Buddhist features, including a meditative posture with hands placed in a symbolic gesture, flowing robes, and a calm facial expression exuding peace. The perspective highlights the statue's immense size against a minimalistic white sky background, underscoring its significance as a spiritual and cultural landmark.

January 26, 2025 at 3:17 AM

Reposted by Amadeus

Python Hub

@pythonhub.dev

The “Active Enum” Pattern

Enums are objects, why not give them attributes?

https://blog.glyph.im/2025/01/active-enum.html

January 26, 2025 at 5:15 AM

Reposted by Amadeus

Sung Kim

@sungkim.bsky.social

Aider reports that R1+Sonnet (R1 Thinking + Sonnet) set a new SOTA on the aider polyglot benchmark at 14X less cost compared to o1.

64% R1+Sonnet
62% o1
57% R1
52% Sonnet
48% DeepSeek V3

aider.chat/2025/01/24/r...

R1+Sonnet set SOTA on aider’s polyglot benchmark

R1+Sonnet has set a new SOTA on the aider polyglot benchmark. At 14X less cost compared to o1.

aider.chat

January 24, 2025 at 5:46 PM

Reposted by Amadeus

Tim Kellogg

@timkellogg.me

MiniMax-01 supports a 4M (!!!) context width

That’s 100% on needle in the haystack all the way through 4M, as i understand (seems like a benchmark mistake tbqh, it’s too good)

www.minimaxi.com/en/news/mini...

MiniMax - Intelligence with everyone

MiniMax is a leading global technology company and one of the pioneers of large language models (LLMs) in Asia. Our mission is to build a world where intelligence thrives with everyone.

www.minimaxi.com

January 15, 2025 at 3:10 AM

Reposted by Amadeus

Sung Kim

@sungkim.bsky.social

InternLM v3

- Performance surpasses models like Llama3.1-8B and Qwen2.5-7B
- Capable of deep reasoning with system prompts
- Trained only on 4T high-quality tokens

huggingface.co/collections/...

January 15, 2025 at 8:24 AM

Reposted by Amadeus

Tom Aarsen

@tomaarsen.com

The newest extremely strong embedding model based on ModernBERT-base is out: `cde-small-v2`. Both faster and stronger than its predecessor, this one tops the MTEB leaderboard for its tiny size!

Details in 🧵

January 14, 2025 at 1:21 PM

Reposted by Amadeus

Sebastian Raschka (rasbt)

@sebastianraschka.com

"Sky-T1-32B-Preview, our reasoning model that performs on par with o1-preview on popular reasoning and coding benchmarks."
That was quick! Is this already the Alpaca moment for reasoning models?
Source: novasky-ai.github.io/posts/sky-t1/

January 14, 2025 at 12:34 AM

Reposted by Amadeus

Sung Kim

@sungkim.bsky.social

Google's Titans: a new architecture with attention and a meta in-context memory that learns how to memorize at test time as presented by one of the author - @alibehrouz.bsky.social

January 13, 2025 at 7:53 PM

Reposted by Amadeus

merve

@merve.bsky.social

What a week to open the year in open ML, all the things released at @hf.co 🤠

Here's everything released, find text-readable version here huggingface.co/posts/merve/...

All models are here huggingface.co/collections/...

January 10, 2025 at 2:51 PM

Reposted by Amadeus

Sung Kim

@sungkim.bsky.social

Contemplative LLMs: Anxiety is all you need? by Maharshi

Let the LLM 'contemplate' for a bit before answering using this simple system prompt, which might (in most cases) lead to the correct final answer!

maharshi.bearblog.dev/contemplativ...

Contemplative LLMs: Anxiety is all you need?

Let the LLM 'contemplate' before answering

maharshi.bearblog.dev

January 10, 2025 at 3:44 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news