Lightnews — Scholar-powered news

Gaël Varoquaux

@gaelvaroquaux.bsky.social

We also learned that performance on link prediction, the canonical task of knowledge-graph embedding, is not a good proxy for downstream utility.

We believe this is because link prediction only needs local structure, unlike downstream tasks
9/10

November 28, 2025 at 3:46 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Our approach, SEPAL, combines these elements for feature learning on large knowledge graphs.

It creates feature vectors that lead to better performance on downstream tasks, and it is more scalable.
Larger knowledge graphs give feature vectors that provide downstream value
8/10

November 28, 2025 at 3:46 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Splitting huge knowledge graphs in sub-parts is actually hard because of the mix of very highly-connected nodes, and a huge long tail hard to reach.

We introduce a procedure that allows for overlap in the blocks, relaxing a lot the difficulty.
7/10

November 28, 2025 at 3:46 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

To have a very efficient algorithm, we split the graph in overlapping highly-connected blocks that fit in GPU memory.
Propagation is then simple in-memory iterations, and we embed huge graphs on a single GPU.
6/10

November 28, 2025 at 3:46 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Knowledge graphs have long-tailed entity distributions, with many weakly-connected entities on which contrastive learning is under constrained.
For these, we propagate embeddings via the relation operators, in a diffusion-like step, extrapolating from the central entities.
5/10

November 28, 2025 at 3:46 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Our approach uses contrastive learning on a core subset of entities, to capture a large-scale structure.
Consistent with knowledge-graph embedding literature, this step represents relations as operators on the embedding space.
It also anchors the central entities.
4/10

November 28, 2025 at 3:46 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Our paper shows that message passing is a great tool to build feature vectors from graphs

As opposed to contrastive learning, message passing helps embeddings represent the large-scale structure of the graph (it gives Arnoldi-type iterations).

3/10

November 28, 2025 at 3:46 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Graphs can represent knowledge and have scaled to huge sizes (>115M entities in Wikidata).
How to distill these into good downstream features, eg for machine learning?
The challenge is to create feature vectors, and for this graph embeddings have been invaluable.
2/10

November 28, 2025 at 3:46 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

✨ #NeurIPS2025 paper: Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning

Combining contrastive learning and message passing markedly improves features created from embedding graphs, scalable to huge graphs.
It taught us a lot on graph feature learning 👇
1/10

November 28, 2025 at 3:46 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Yesterday, I did 2 AI conferences in Paris; different kind of vibes.
AdoptAI, a business show: pretty videos, promises about AI, 2.30€ coffee
NeurIPS@Paris, a research conference: equations, free coffee

AI business wouldn't exist without research. Let's not forget it and keep investing in research

November 27, 2025 at 8:06 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

I hate to say, but this is one reason why I co-authored:
dl.acm.org/doi/10.1145/...

I genuinely think that we need to collectively act to change the narrative and social norms

November 13, 2025 at 7:49 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Je fais le reporting de fin de projet pour une ANR ( @agencerecherche.bsky.social ) et j'avoue que je dois supprimer des sentiments très négatifs quant au degré de micromanagement et de détail demandé, pour un rapport qui ne sera essentiellement pas lu.

C'est gaspiller de l'argent public

November 13, 2025 at 5:00 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

One of my collaborator sending me a @skrub-data.bsky.social TableReport as an HTML file, with which I can interact, and explore the data, to give him feedback.
Ideal workflow, as far as I am concerned: async, yet interactive, and not needing an infrastructure

October 23, 2025 at 8:10 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

The full text is here: I kept it short, but it is deeply meaningful to me
gael-varoquaux.info/personnal/a-...

October 10, 2025 at 11:37 AM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Let's keep in mind that we can redefine what is cool, and not play in others' game.

Define what we're proud of:

Bigger is not better
Simplicity is a virtue
Tech for the many

October 1, 2025 at 4:42 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Clouds natural bring improved infrastructure. But also enable spying and control.

We need to be careful whom we platform. Tech lords have sometimes the wrong political connections.

October 1, 2025 at 4:42 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

More efficient computing won't suffice.

Efficiency improvements are super useful. But demand will increase more, and catch up. Such a rebound effect is very classic with technology, eg with transportation or energy.

It's really the behaviors that condition resource usage (eg bike > SUV)

October 1, 2025 at 4:42 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

The good news is: tech keeps improving.
Software and algorithms keep getting better, as well as large compute and data infrastructures.

October 1, 2025 at 4:42 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

There are financial considerations indeed. AI actors are burning a crazy amount of money.

But high costs are not always a bad thing (if you own nvidia stock)

October 1, 2025 at 4:42 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

It's thanks to Moore's law, right? Computation is getting more effective....

Well, the cost has been exploding (exponentially indeed).

So it's really about pouring more and more money

October 1, 2025 at 4:42 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

The story is that we're getting there by waiting for faster GPUs and bigger datasets

and indeed, the compute used has explode, in a super-exponential growth, going way beyond the daily compute of the biggest computers

October 1, 2025 at 4:42 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

It's cool because is promises really amazing, very great, awesome productivity gains

(look at those studies by microsoft, IBM, Google...)

October 1, 2025 at 4:42 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

So, what's cool in tech?

Well, AI is cool...

it's all over the news, the people on the pictures look healthy and happy (and also white and male), and there is always a big amount of dollars associated

October 1, 2025 at 4:42 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

A normative framework is the set of implicit rules and values that define the normal

What is "normal" is cultural by nature

October 1, 2025 at 4:42 PM

Gaël Varoquaux

@gaelvaroquaux.bsky.social

Come to my lightning talk
At @pydataparis.bsky.social in a few minutes

October 1, 2025 at 3:06 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news