Lightnews — Scholar-powered news

Clayton Thorrez

@cthorrez.bsky.social

EsportsBench refreshed with data up through June 2025, over 61k new matches across 20 esports have been recorded in the last 3 months!
huggingface.co/datasets/Esp...

July 16, 2025 at 6:31 AM

Clayton Thorrez

@cthorrez.bsky.social

Extremely excited to announce that I've joined @lmarena.bsky.social !

For years I've been working in LLMs for my job, and hacking on rankings and ratings for fun, beyond thrilled to be able to join this project at the intersection!

July 15, 2025 at 3:36 AM

Clayton Thorrez

@cthorrez.bsky.social

Just ran into Simpsons paradox in the wild for the first time lol. Was looking at some data and was like "that doesn't look right all the means went up when all I did was assign groups differently, this is like Simpson's paradox or something lol"

July 10, 2025 at 4:08 AM

Clayton Thorrez

@cthorrez.bsky.social

if you fork any of my GitHub repos I WILL add you on LinkedIn.

There are so few people actively working on rating system stuff and I want to talk to all of them

July 3, 2025 at 10:31 PM

Clayton Thorrez

@cthorrez.bsky.social

I just realized MSI is going on and in Vancouver so got a ticket for tomorrow lol.

Any on esports/machine learning people going?

July 3, 2025 at 8:57 PM

Clayton Thorrez

@cthorrez.bsky.social

I love it when the same notation can mean the *exact* opposite thing when used by different authors...

Should "A ≻ B" mean:
"A is preferred to B (higher rating)"
"B is preferred to A (lower rank number)"?

arxiv.org/pdf/2411.049...
www.tandfonline.com/doi/full/10....

July 3, 2025 at 6:56 PM

Clayton Thorrez

@cthorrez.bsky.social

if you have a really small network, and a really small dataset, is it possible to fuse an entire transformer training loop into a single kernel?

July 3, 2025 at 3:13 PM

Clayton Thorrez

@cthorrez.bsky.social

It turns out adding sum to 0 constraints on some parameters is actually a fair but harder than simplex constraints (non negative, sum to 1) in the iterative gradient based optimization setting.

Is there a good equivalent to the softmax truck?

July 2, 2025 at 2:16 AM

Clayton Thorrez

@cthorrez.bsky.social

Super disappointed by this.

I'm a huge fan of esports, attending numerous events over the past 12 years and watching countless hours on twitch. I would rather see esports shrink down to a grassroots core than get children addicted to gambling.

x.com/riotgames/st...

Riot Games on X: "Why We're Opening Betting Sponsorships in Esports & How We're Doing It Responsibly" / X

Why We're Opening Betting Sponsorships in Esports & How We're Doing It Responsibly

x.com

June 26, 2025 at 11:51 PM

Clayton Thorrez

@cthorrez.bsky.social

qwen3-235b be like

June 24, 2025 at 9:57 PM

Clayton Thorrez

@cthorrez.bsky.social

How much value does thinking add to an LLM?

Well for the largest Qwen3, the answer is -28 points

Thinking on academic benchmarks seems to help a lot, I wonder what's going wrong in the arena?

Maybe people can sense the hedging and don't like it, or it poisons its own context with overthinking

June 24, 2025 at 9:56 PM

Clayton Thorrez

@cthorrez.bsky.social

there's a lotta with the last name Ferguson

not a lotta people with the first name Fergus

June 19, 2025 at 6:04 AM

Clayton Thorrez

@cthorrez.bsky.social

Mark Glickman is on a roll now! 2 Paper in two weeks
This time extending the stength dependent draw model to the online setting for use in dynamic rating systems.

Haven't read the whole thing but it looks to contain some cool approximation tricks for the posterior

arxiv.org/abs/2506.11354

Rating competitors in games with strength-dependent tie probabilities

Competitor rating systems for head-to-head games are typically used to measure playing strength from game outcomes. Ratings computed from these systems are often used to select top competitors for eli...

arxiv.org

June 19, 2025 at 4:01 AM

Clayton Thorrez

@cthorrez.bsky.social

When gemini writes code in an artifact window, there are 9 buttons on the UI

None of them are to copy the code

June 14, 2025 at 5:33 AM

Clayton Thorrez

@cthorrez.bsky.social

I want to train an 16 billion parameter model.

Specifically, a 16 billion parameter TrueSkill model which fits a skill mean and variance for each of the 8 billion people on earth.

But in my quest to scale rating systems, I guess I start with lichess, with 6B games and a measly 20M unique players

June 9, 2025 at 5:49 AM

Clayton Thorrez

@cthorrez.bsky.social

🚨NEW MARK GLICKMAN PAPER🚨
Paired comparison models with strength-dependent ties and order effects
arxiv.org/abs/2505.24783

Basically what the title says, extending Bradley-Terry with ties, home field advantage, and allowing those factors to depend on how strong the competitors are.

Paired comparison models with strength-dependent ties and order effects

Paired comparison models, such as the Bradley-Terry (1952) model and its variants, are commonly used to measure competitor strength in games and sports. Extensions have been proposed to account for or...

arxiv.org

June 5, 2025 at 9:55 PM

Clayton Thorrez

@cthorrez.bsky.social

Fear leads to anger, anger leads to Rēvolfed, Rēvolfed is the path the dark side

cait (and adonis) @cait.bsky.social · Jun 5

a friend of mine shared this ai-generated "emotion wheel" and unfortunately i have been laughing my ass off at it for like 15 minutes now. today i am feeling Fnliinneon

emotion wheel starting with "joy, fear, anger" in the center and then lapsing into gibberish

June 5, 2025 at 4:24 PM

Clayton Thorrez

@cthorrez.bsky.social

Anyone know how to report a bug in Google Scholar? Online seems like there is not great public support. It's not my paper just one I reference a lot.

Google thinks TrueSkill is in Russian!
scholar.google.com/scholar?hl=e...

June 5, 2025 at 4:22 PM

Clayton Thorrez

@cthorrez.bsky.social

This is the post/thread that made me believe bsky has a future

Lilah Sturges @lilahsturges.com · Jun 3

Are all muppets the same species just with wildly different phenotypes like dogs or are there a lot of different species of muppet

June 4, 2025 at 3:42 PM

Clayton Thorrez

@cthorrez.bsky.social

Question for my rating systems peeps: what does the RD in Glicko, or the sigma in TrueSkill represent? How do you interpret that number?

June 3, 2025 at 7:52 PM

Clayton Thorrez

@cthorrez.bsky.social

Is this like a numerical issue on wolfram or am I missing something? This should be 0 right?
www.wolframalpha.com/input?i=%281...

June 1, 2025 at 11:05 PM

Clayton Thorrez

@cthorrez.bsky.social

Ok I've officially had an idea for a modification of Elo that is consistently an improvement over base Elo averged over 20 datasets with significant baseline hyperparameter tuning.

So I've passed the state of the art form the 1960's. The bad news is that it still gets mogged by Glicko.

May 31, 2025 at 6:09 PM

Clayton Thorrez

@cthorrez.bsky.social

look I'm a big fan of WSL, and I used it for all of my side projects, but it also has a enough issues to require me to make this powershell script lol

May 31, 2025 at 6:23 AM

Clayton Thorrez

@cthorrez.bsky.social

Interesting little finding, in Elo there are basically 2 hyperparameters, k which is the learning rate/step size, and scale, which is the temperature of the softmax. Most people understand k, high k means big rating changes after wins and losses, low k means small updates.

May 31, 2025 at 1:14 AM

Clayton Thorrez

@cthorrez.bsky.social

Pretty cool that I wrote the currently deployed ranking code for a company now valued at 600 million dollars 🤯🤯🤯

github.com/lm-sys/FastC...

techcrunch.com/2025/05/21/l...

Accelerate Bradley Terry MLE model fitting by cthorrez · Pull Request #3523 · lm-sys/FastChat

Why are these changes needed? The bootstrap Bradley Terry model takes upwards of 15 minutes to run for 100 samples. This is costly on resources and hinders experiments such as studying hyperparamet...

github.com

May 22, 2025 at 8:53 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news