Skrub
banner
skrub-data.bsky.social
Skrub
@skrub-data.bsky.social
skrub is a Python library to ease preprocessing and feature engineering for tabular machine learning.
Our long-term goal is to directly connect database tables to machine learning estimators.

https://skrub-data.org
https://discord.gg/ABaPnm7fDC
Skrub includes a powerful set of transformers and selectors that allow to transform columns based on various conditions.

ApplyToCols lets you select a subset of columns in your dataframe, then applies a transformer to each selected column separately.
October 8, 2025 at 12:43 PM
Reposted by Skrub
On vous a déjà dit que Skrub c'est cool ? Et que l'intervention de @riccardocappuzzo.com était très chouette ? Hein, on vous l'a dit ?
skrub-data.org/skrub-materi...
October 7, 2025 at 2:44 PM
@pydataparis.bsky.social 2025 is over, and it was a big success!

Our talk was very well received, and we got a lot of great questions, especially about scalability and how to interface with other libraries in production environments.
October 7, 2025 at 2:36 PM
Reposted by Skrub
What a banger is skrub @skrub-data.bsky.social !

Big thumbs up for the sklearn team & the maintainer of this package
October 1, 2025 at 8:24 AM
📅 Less than a week away! The talk will be on Oct 1st at 10.05AM in room Louis Armand 1 - Est.

If you want to contribute to skrub, we will also have a sprint on Thursday.

See you there!
📢 Talk Announcement

"Skrub: machine learning for dataframes", by Guillaume Lemaitre, Jérôme Dockès and @riccardocappuzzo.com.
@skrub-data.bsky.social

📜 Talk info: pretalx.com/pydata-paris-2025/talk/T9KTPU
📅 Schedule: pydata.org/paris2025/schedule
🎟 Tickets: pydata.org/paris2025/tickets
September 26, 2025 at 8:50 AM
Reposted by Skrub
Reminder: skrub == cool
skrub DataOps help you construct complex and extensive hyperparameter search spaces. However, interpreting results from large grids can be challenging.
To address this, skrub generates a parallel coordinate plot that visualizes all runs and the parameters used to achieve specific results.
September 12, 2025 at 1:34 PM
skrub DataOps help you construct complex and extensive hyperparameter search spaces. However, interpreting results from large grids can be challenging.
To address this, skrub generates a parallel coordinate plot that visualizes all runs and the parameters used to achieve specific results.
September 12, 2025 at 12:56 PM
Do you have to deal with numerical features that involve large outliers, and need to train linear models or neural networks?

Then you might want to try the skrub SquashingScaler. The SquashingScaler behaves like scikit-learn RobustScaler, but smoothly clips outliers to predefined boundaries.
September 5, 2025 at 8:47 AM
Reposted by Skrub
Our first talk tonight is from @gaelvaroquaux.bsky.social on @skrub-data.bsky.social.

Real tables are too messy for sklearn - skrub preprocesses them for you.
September 2, 2025 at 6:28 PM
Reposted by Skrub
Had a great PyData London tonight! Was a real treat to hear from @gaelvaroquaux.bsky.social on @skrub-data.bsky.social and the real world data pains its solving. (Try it if you haven’t already; super easy to get going!)
Our first talk tonight is from @gaelvaroquaux.bsky.social on @skrub-data.bsky.social.

Real tables are too messy for sklearn - skrub preprocesses them for you.
September 3, 2025 at 12:06 AM
⚡Maintenance release ⚡

Release 0.6.1 fixes a bug that may happen when combining certain column-based skrub transformers with the scikit-learn ColumnTransformer.

github.com/skrub-data/s...
Release Skrub release 0.6.1 · skrub-data/skrub
Bugfixes get_feature_names_out now works correctly when used by GapEncoder, DropCols, SelectCols: from within a scikit-learn Pipeline. In addition, DropCols’s get_feature_names_out method now retu...
github.com
August 29, 2025 at 3:59 PM
We had a great tutorial at #EuroScipy2025!

We had the opportunity of showing off the features of skrub to a wide audience, and show how they can be used in a pretty complex use case.
Attending the @skrub-data.bsky.social tutorial by @riccardocappuzzo.com and @glemaitre58.bsky.social at #EuroScipy2025. They introduce the new DataOps feature released in skrub 0.6.

Here is the repo with the material for the tutorial: github.com/skrub-data/E...
August 29, 2025 at 3:57 PM
⚡ Release 0.6.0 is now out! ⚡

🚀 Major update! Skrub DataOps, various improvements for the TableReport, new tools for applying transformers to the columns, and a new robust transformer for numerical features are only some of the features included in this release.
July 24, 2025 at 3:55 PM
📅 The skrub API includes various functions and objects that help with dealing with datetime strings. 1/
June 19, 2025 at 12:45 PM
🚀⚡ Release: 0.5.4:
Maintenance release!
This release makes skrub compatible with scikit-learn 1.7.

Changelog:
skrub-data.org/stable/CHANG...
Release history
Release 0.5.4: Maintenance: Make skrub compatible with scikit-learn 1.7.#1434 by Vincent Maladiere.. Release 0.5.3: Changes: The SimpleCleaner has been renamed to Cleaner. Use of the name SimpleCle...
skrub-data.org
June 7, 2025 at 4:06 PM
👀 This week's post will be another sneak peek into skrub expressions, an upcoming feature that will ease the preparation and execution of machine learning pipelines on dataframes.

This time we will focus on how expressions can simplify the construction of complex hyperparameter grids.
June 4, 2025 at 12:46 PM
📝 The skrub TextEncoder brings the power of HuggingFace language models to embed text features in tabular machine learning, for all those use cases that involve text-based columns.
May 28, 2025 at 8:43 AM
The Skrub Cleaner is a lightweight transformer that performs consistency checks on a dataframe:

🔍 It gives a uniform representation of null values, converting those represented as strings (such as "N/A")
🗑️ It drops columns that contain too many null values (according to a user-defined threshold)
May 21, 2025 at 8:53 AM
👀 This week's post is a sneak peek into the next major Skrub feature, Skrub expressions 🚀

As this is a preview of an upcoming feature, we are looking for your thoughts and feedback before release.
April 30, 2025 at 10:00 AM
The Skrub TableReport is a lightweight tool that allows to get a rich overview of a table quickly and easily.

✅ Filter columns
🔎 Look at each column's distribution
📊 Get a high level view of the distributions through stats and plots, including correlated columns
🌐 Export the report as html
April 23, 2025 at 11:49 AM
🚀 The Skrub learning materials website is now live at:
skrub-data.org/skrub-materi...

Here you'll find introductory talks and tutorials about Skrub, along with notebooks and blog posts showcasing the features of the library.

Bookmark it to not miss any update 👀
Skrub learning materials index – Skrub learning materials
skrub-data.org
April 9, 2025 at 9:08 AM
🚀⚡ Release: 0.5.3

Check out the release notes:
skrub-data.org/stable/CHANG...

Highlights below ⤵️
April 3, 2025 at 4:49 PM
🗒️ Do you need to prepare a ML model, and you are working with text and strings?
Skrub provides four encoders to convert strings into numerical features. 🤗 models included!

What's the best? Check out our blog post to find out 👀

skrub-data.org/skrub-materi...
What’s the best way to encode categorical features? A use case with Skrub encoders – Skrub learning materials
skrub-data.org
March 26, 2025 at 8:50 AM