Lightnews — Scholar-powered news

Samuel Müller

@sammuller.bsky.social

Check out our position paper and come to our ICML poster (Thursday 4:30 PM, East Exhibition Hall A-B E-606).

arxiv.org/abs/2505.23947 n/n

Position: The Future of Bayesian Prediction Is Prior-Fitted

Training neural networks on randomly generated artificial datasets yields Bayesian models that capture the prior defined by the dataset-generating distribution. Prior-data Fitted Networks (PFNs) are a...

arxiv.org

July 8, 2025 at 8:03 PM

Samuel Müller

@sammuller.bsky.social

There are already early examples of this, that we discuss, in areas as diverse as biology, Bayesian optimization, time-series forecasting, and tabular data. The most prominent being TabPFN (Nature '25). 5/n

news.ycombinator.com/item?id=4264...

Show HN: TabPFN v2 – A SOTA foundation model for small tabular data | Hacker News

news.ycombinator.com

July 8, 2025 at 8:03 PM

Samuel Müller

@sammuller.bsky.social

We go into detailed comparisons to other Bayesian methods and the trade-offs that lead us to the conclusion, that PFNs will become dominant for Bayesian prediction, and further that Bayesian prediction will become more important overall with better priors. 4/n

July 8, 2025 at 8:03 PM

Samuel Müller

@sammuller.bsky.social

What's nice is that the model after training on this random data, will start to make sense of real-world data, too. It will approximate the posterior belonging to the prior of choice, e.g., a BNN, a GP, or in the most interesting cases a Bayesian model that doesn't exist yet. 3/n

July 8, 2025 at 8:03 PM

Samuel Müller

@sammuller.bsky.social

Prior-data fitted networks (PFNs) do just that!

The PFN idea is to use a prior, e.g. a bayesian neural network (BNN) prior, sample datasets from that prior, and then train to predict the hold-out labels of these datasets. (no training on real-world data) 2/n

July 8, 2025 at 8:03 PM

Samuel Müller

@sammuller.bsky.social

To then change it? In like „overhaul“?

April 15, 2025 at 1:57 PM

Samuel Müller

@sammuller.bsky.social

Find my full write up (including scenarios with bad actors, as well as the prompts used) plus the game here: github.com/SamuelGabrie...
If you think, my single person experiment is not to be trusted? You are right, try it yourself!

GitHub - SamuelGabriel/LMARENA-GAMING

Contribute to SamuelGabriel/LMARENA-GAMING development by creating an account on GitHub.

github.com

February 24, 2025 at 1:17 PM

Samuel Müller

@sammuller.bsky.social

In combination with the large employee numbers at top AI labs and small numbers of votes on lmarena lead me to the conclusion that lmarena scores are probably dominated by biased votes.

February 24, 2025 at 1:17 PM

Samuel Müller

@sammuller.bsky.social

In hard mode I attributed 13/20 completely correctly, much higher than the expected 3.3 of random guessing.
That is I could identify all 3 models correctly in 13/20 cases after practicing with 20 questions.
That means attributing responses to LLMs is super easy for humans.

February 24, 2025 at 1:17 PM

Samuel Müller

@sammuller.bsky.social

I first played easy mode (see below), where I got two answers from each model and need to match them.
I used 20 interactions in the easy mode to learn the models' behaviors.
In hard mode (see prev post), you need to match three responses to the LLM name.

February 24, 2025 at 1:17 PM

Samuel Müller

@sammuller.bsky.social

Second, employees are very likely able to tell models apart based on their gut feeling.
To figure out if this is the case, I created a game with two modes.
The game is about identifying which answer was provided by which LLM.

February 24, 2025 at 1:17 PM

Samuel Müller

@sammuller.bsky.social

First, AI labs have enough employees to bias the benchmarks.
E.g. Grok 3 only has 10K votes and there are 2.7M votes in total on lmarena.
If half of e.g. OpenAI (2,000 employees) voted just once a day, they would make up > 10% of all 2.7M lmarena votes over its one-year existence.

February 24, 2025 at 1:17 PM

Samuel Müller

@sammuller.bsky.social

seems to beat boosting there, too, but prob a bit early to make definitive statements

February 7, 2025 at 8:43 AM

Samuel Müller

@sammuller.bsky.social

What did you think was interesting? The interview had such bad timing, a few days before the r1 launch

January 26, 2025 at 7:12 AM

Samuel Müller

@sammuller.bsky.social

We have an r implementation under development currently. See here github.com/robintibor/R...

GitHub - robintibor/R-tabpfn

Contribute to robintibor/R-tabpfn development by creating an account on GitHub.

github.com

January 11, 2025 at 2:10 PM

Samuel Müller

@sammuller.bsky.social

Thank you :) So far, we only open source the model itself and how to use it. We do not open source how to train it exactly, sorry for that :| there is a company starting based on the model, thus it is kinda its mode

January 9, 2025 at 11:53 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news