Lightnews — Scholar-powered news

Alex Gates

@complexgates.bsky.social

1.4K followers 730 following 33 posts

Assistant Professor of Data Science @UVA. Network Science, human behavior and the emergent constraints of the organizations and ecosystems we build.

Posts Replies Media Videos

Alex Gates

@complexgates.bsky.social

9/
To wrap up: this framework doesn’t solve every debate about clustering similarity…
…but it does finally give us a shared language for understanding why different measures disagree, and how they fit together.
If you’re curious, the preprint is here👇
arxiv.org/abs/2511.03000

Thanks for reading! 🧵✨

Unifying Information-Theoretic and Pair-Counting Clustering Similarity

Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity ...

arxiv.org

November 24, 2025 at 8:58 PM

Alex Gates

@complexgates.bsky.social

8/
I'm also able to show that information-theoretic measures can be approximated using higher-order tuple counting (triplets, quads, …) built on top of pair counting.

November 24, 2025 at 8:58 PM

Alex Gates

@complexgates.bsky.social

7/
So…I’m excited to share my paper introducing a unified framework for clustering similarity:

It introduces a unified framework where pair-counting and information-theoretic measures both are expressed as algebraic expansions around “independence” and pin-points which terms differ

November 24, 2025 at 8:58 PM

Alex Gates

@complexgates.bsky.social

6/
As a community, we have plenty of examples of when the measures differ, but we’ve lacked a principled framework explaining why these measures disagree, how they relate, and whether they’re reconcilable.

And honestly?
It always bothered me.

November 24, 2025 at 8:58 PM

Alex Gates

@complexgates.bsky.social

5/
If you’ve ever used examples from both families, there’s a good chance:

The pair-counting score says these clusterings are nearly identical!
The information-theoretic score says they share almost no structure!

…and you’re left thinking:
“How can both be ‘right’?”

November 24, 2025 at 8:58 PM

Alex Gates

@complexgates.bsky.social

4/
Pair-counting measures think in terms of pairs of items:
“How many pairs of nodes did both communities put together?… or apart?”

Information-theoretic measures ask instead:
“How much uncertainty remains in one clustering given the other?”

November 24, 2025 at 8:58 PM

Alex Gates

@complexgates.bsky.social

3/
Broadly, the community has coalesced around two major families of clustering similarity measures:

Pair-counting measures
(e.g., Rand, Adjusted Rand, Jaccard)

…and

Information-theoretic measures
(e.g., Mutual Information, NMI, Variation of Information)

November 24, 2025 at 8:58 PM

Alex Gates

@complexgates.bsky.social

2/
Clustering is everywhere in science: communities in social networks, customer segments in marketing, functional groups in biology

And yet, there’s no universal “right” answer for how similar they are.

Turns out: measuring similarity between clusterings is surprisingly deep, subtle, and messy.

November 24, 2025 at 8:58 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news