Gaël Varoquaux
banner
gaelvaroquaux.bsky.social
Gaël Varoquaux
@gaelvaroquaux.bsky.social
Research & code: Research director @inria
►Data, Health, & Computer science
►Python coder, (co)founder of scikit-learn, joblib, & @probabl.bsky.social
►Sometimes does art photography
►Physics PhD
We also learned that performance on link prediction, the canonical task of knowledge-graph embedding, is not a good proxy for downstream utility.

We believe this is because link prediction only needs local structure, unlike downstream tasks
9/10
November 28, 2025 at 3:46 PM
Our approach, SEPAL, combines these elements for feature learning on large knowledge graphs.

It creates feature vectors that lead to better performance on downstream tasks, and it is more scalable.
Larger knowledge graphs give feature vectors that provide downstream value
8/10
November 28, 2025 at 3:46 PM
Splitting huge knowledge graphs in sub-parts is actually hard because of the mix of very highly-connected nodes, and a huge long tail hard to reach.

We introduce a procedure that allows for overlap in the blocks, relaxing a lot the difficulty.
7/10
November 28, 2025 at 3:46 PM
To have a very efficient algorithm, we split the graph in overlapping highly-connected blocks that fit in GPU memory.
Propagation is then simple in-memory iterations, and we embed huge graphs on a single GPU.
6/10
November 28, 2025 at 3:46 PM
Knowledge graphs have long-tailed entity distributions, with many weakly-connected entities on which contrastive learning is under constrained.
For these, we propagate embeddings via the relation operators, in a diffusion-like step, extrapolating from the central entities.
5/10
November 28, 2025 at 3:46 PM
Our approach uses contrastive learning on a core subset of entities, to capture a large-scale structure.
Consistent with knowledge-graph embedding literature, this step represents relations as operators on the embedding space.
It also anchors the central entities.
4/10
November 28, 2025 at 3:46 PM
Our paper shows that message passing is a great tool to build feature vectors from graphs

As opposed to contrastive learning, message passing helps embeddings represent the large-scale structure of the graph (it gives Arnoldi-type iterations).

3/10
November 28, 2025 at 3:46 PM
Graphs can represent knowledge and have scaled to huge sizes (>115M entities in Wikidata).
How to distill these into good downstream features, eg for machine learning?
The challenge is to create feature vectors, and for this graph embeddings have been invaluable.
2/10
November 28, 2025 at 3:46 PM
#NeurIPS2025 paper: Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning

Combining contrastive learning and message passing markedly improves features created from embedding graphs, scalable to huge graphs.
It taught us a lot on graph feature learning 👇
1/10
November 28, 2025 at 3:46 PM
Yesterday, I did 2 AI conferences in Paris; different kind of vibes.
AdoptAI, a business show: pretty videos, promises about AI, 2.30€ coffee
NeurIPS@Paris, a research conference: equations, free coffee

AI business wouldn't exist without research. Let's not forget it and keep investing in research
November 27, 2025 at 8:06 PM
I hate to say, but this is one reason why I co-authored:
dl.acm.org/doi/10.1145/...

I genuinely think that we need to collectively act to change the narrative and social norms
November 13, 2025 at 7:49 PM
Je fais le reporting de fin de projet pour une ANR ( @agencerecherche.bsky.social ) et j'avoue que je dois supprimer des sentiments très négatifs quant au degré de micromanagement et de détail demandé, pour un rapport qui ne sera essentiellement pas lu.

C'est gaspiller de l'argent public
November 13, 2025 at 5:00 PM
One of my collaborator sending me a @skrub-data.bsky.social TableReport as an HTML file, with which I can interact, and explore the data, to give him feedback.
Ideal workflow, as far as I am concerned: async, yet interactive, and not needing an infrastructure
October 23, 2025 at 8:10 PM
The full text is here: I kept it short, but it is deeply meaningful to me
gael-varoquaux.info/personnal/a-...
October 10, 2025 at 11:37 AM
Let's keep in mind that we can redefine what is cool, and not play in others' game.

Define what we're proud of:

Bigger is not better
Simplicity is a virtue
Tech for the many
October 1, 2025 at 4:42 PM
Clouds natural bring improved infrastructure. But also enable spying and control.

We need to be careful whom we platform. Tech lords have sometimes the wrong political connections.
October 1, 2025 at 4:42 PM
More efficient computing won't suffice.

Efficiency improvements are super useful. But demand will increase more, and catch up. Such a rebound effect is very classic with technology, eg with transportation or energy.

It's really the behaviors that condition resource usage (eg bike > SUV)
October 1, 2025 at 4:42 PM
The good news is: tech keeps improving.
Software and algorithms keep getting better, as well as large compute and data infrastructures.
October 1, 2025 at 4:42 PM
There are financial considerations indeed. AI actors are burning a crazy amount of money.

But high costs are not always a bad thing (if you own nvidia stock)
October 1, 2025 at 4:42 PM
It's thanks to Moore's law, right? Computation is getting more effective....

Well, the cost has been exploding (exponentially indeed).

So it's really about pouring more and more money
October 1, 2025 at 4:42 PM
The story is that we're getting there by waiting for faster GPUs and bigger datasets

and indeed, the compute used has explode, in a super-exponential growth, going way beyond the daily compute of the biggest computers
October 1, 2025 at 4:42 PM
It's cool because is promises really amazing, very great, awesome productivity gains

(look at those studies by microsoft, IBM, Google...)
October 1, 2025 at 4:42 PM
So, what's cool in tech?

Well, AI is cool...

it's all over the news, the people on the pictures look healthy and happy (and also white and male), and there is always a big amount of dollars associated
October 1, 2025 at 4:42 PM
A normative framework is the set of implicit rules and values that define the normal

What is "normal" is cultural by nature
October 1, 2025 at 4:42 PM
Come to my lightning talk
At @pydataparis.bsky.social in a few minutes
October 1, 2025 at 3:06 PM