Our long-term goal is to directly connect database tables to machine learning estimators.
https://skrub-data.org
https://discord.gg/ABaPnm7fDC
ApplyToCols lets you select a subset of columns in your dataframe, then applies a transformer to each selected column separately.
ApplyToCols lets you select a subset of columns in your dataframe, then applies a transformer to each selected column separately.
skrub-data.org/skrub-materi...
skrub-data.org/skrub-materi...
Our talk was very well received, and we got a lot of great questions, especially about scalability and how to interface with other libraries in production environments.
Our talk was very well received, and we got a lot of great questions, especially about scalability and how to interface with other libraries in production environments.
Big thumbs up for the sklearn team & the maintainer of this package
Big thumbs up for the sklearn team & the maintainer of this package
If you want to contribute to skrub, we will also have a sprint on Thursday.
See you there!
"Skrub: machine learning for dataframes", by Guillaume Lemaitre, Jérôme Dockès and @riccardocappuzzo.com.
@skrub-data.bsky.social
📜 Talk info: pretalx.com/pydata-paris-2025/talk/T9KTPU
📅 Schedule: pydata.org/paris2025/schedule
🎟 Tickets: pydata.org/paris2025/tickets
If you want to contribute to skrub, we will also have a sprint on Thursday.
See you there!
To address this, skrub generates a parallel coordinate plot that visualizes all runs and the parameters used to achieve specific results.
To address this, skrub generates a parallel coordinate plot that visualizes all runs and the parameters used to achieve specific results.
To address this, skrub generates a parallel coordinate plot that visualizes all runs and the parameters used to achieve specific results.
Then you might want to try the skrub SquashingScaler. The SquashingScaler behaves like scikit-learn RobustScaler, but smoothly clips outliers to predefined boundaries.
Then you might want to try the skrub SquashingScaler. The SquashingScaler behaves like scikit-learn RobustScaler, but smoothly clips outliers to predefined boundaries.
Real tables are too messy for sklearn - skrub preprocesses them for you.
Real tables are too messy for sklearn - skrub preprocesses them for you.
Real tables are too messy for sklearn - skrub preprocesses them for you.
Release 0.6.1 fixes a bug that may happen when combining certain column-based skrub transformers with the scikit-learn ColumnTransformer.
github.com/skrub-data/s...
Release 0.6.1 fixes a bug that may happen when combining certain column-based skrub transformers with the scikit-learn ColumnTransformer.
github.com/skrub-data/s...
We had the opportunity of showing off the features of skrub to a wide audience, and show how they can be used in a pretty complex use case.
Here is the repo with the material for the tutorial: github.com/skrub-data/E...
We had the opportunity of showing off the features of skrub to a wide audience, and show how they can be used in a pretty complex use case.
🚀 Major update! Skrub DataOps, various improvements for the TableReport, new tools for applying transformers to the columns, and a new robust transformer for numerical features are only some of the features included in this release.
🚀 Major update! Skrub DataOps, various improvements for the TableReport, new tools for applying transformers to the columns, and a new robust transformer for numerical features are only some of the features included in this release.
Maintenance release!
This release makes skrub compatible with scikit-learn 1.7.
Changelog:
skrub-data.org/stable/CHANG...
Maintenance release!
This release makes skrub compatible with scikit-learn 1.7.
Changelog:
skrub-data.org/stable/CHANG...
This time we will focus on how expressions can simplify the construction of complex hyperparameter grids.
This time we will focus on how expressions can simplify the construction of complex hyperparameter grids.
🔍 It gives a uniform representation of null values, converting those represented as strings (such as "N/A")
🗑️ It drops columns that contain too many null values (according to a user-defined threshold)
🔍 It gives a uniform representation of null values, converting those represented as strings (such as "N/A")
🗑️ It drops columns that contain too many null values (according to a user-defined threshold)
As this is a preview of an upcoming feature, we are looking for your thoughts and feedback before release.
As this is a preview of an upcoming feature, we are looking for your thoughts and feedback before release.
✅ Filter columns
🔎 Look at each column's distribution
📊 Get a high level view of the distributions through stats and plots, including correlated columns
🌐 Export the report as html
✅ Filter columns
🔎 Look at each column's distribution
📊 Get a high level view of the distributions through stats and plots, including correlated columns
🌐 Export the report as html
skrub-data.org/skrub-materi...
Here you'll find introductory talks and tutorials about Skrub, along with notebooks and blog posts showcasing the features of the library.
Bookmark it to not miss any update 👀
skrub-data.org/skrub-materi...
Here you'll find introductory talks and tutorials about Skrub, along with notebooks and blog posts showcasing the features of the library.
Bookmark it to not miss any update 👀
Skrub provides four encoders to convert strings into numerical features. 🤗 models included!
What's the best? Check out our blog post to find out 👀
skrub-data.org/skrub-materi...
Skrub provides four encoders to convert strings into numerical features. 🤗 models included!
What's the best? Check out our blog post to find out 👀
skrub-data.org/skrub-materi...
github.com/GaelVaroquau...
github.com/GaelVaroquau...