Lightnews — Scholar-powered news

Skrub

@skrub-data.bsky.social

620 followers 48 following 110 posts

skrub is a Python library to ease preprocessing and feature engineering for tabular machine learning.
Our long-term goal is to directly connect database tables to machine learning estimators.

https://skrub-data.org
https://discord.gg/ABaPnm7fDC

Posts Replies Media Videos

Skrub

@skrub-data.bsky.social

skrub-data.org/stable/refer...

ApplyToFrame

Gallery examples: Hands-On with Column Selection and Transformers

skrub-data.org

October 8, 2025 at 12:43 PM

Skrub

@skrub-data.bsky.social

skrub-data.org/stable/refer...

ApplyToCols

Gallery examples: Getting Started Hands-On with Column Selection and Transformers

skrub-data.org

October 8, 2025 at 12:43 PM

Skrub

@skrub-data.bsky.social

Example: skrub-data.org/stable/auto_...

Hands-On with Column Selection and Transformers

In previous examples, we saw how skrub provides powerful abstractions like TableVectorizer and tabular_pipeline() to create pipelines. In this new example, we show how to create more flexible pipel...

skrub-data.org

October 8, 2025 at 12:43 PM

Skrub

@skrub-data.bsky.social

For even more control over column selection, skrub provides a collection of selectors that let you partition dataframes by data type, column name, or user-specified functions.

October 8, 2025 at 12:43 PM

Skrub

@skrub-data.bsky.social

All these transformers can be concatenated and inserted in a scikit-learn pipeline to build a feature matrix with complex column selection operation, and can be seen as an alternative for the scikit-learn ColumnTransformer.

October 8, 2025 at 12:43 PM

Skrub

@skrub-data.bsky.social

ApplyToFrame selects columns in the same way, but then uses all of them at the same time as input to the transformer: this is useful for dimensionality reduction.
SelectCols and DropCols can be used as "filtering blocks" in a pipeline.

October 8, 2025 at 12:43 PM

Skrub

@skrub-data.bsky.social

Slides:
skrub-data.org/skrub-materi...

Skrub learning materials – Skrub

skrub-data.org

October 7, 2025 at 2:36 PM

Skrub

@skrub-data.bsky.social

Thanks to @riccardocappuzzo.com , @glemaitre58.bsky.social and Jérôme Dockès for preparing the talk, and mentoring at the sprint!

October 7, 2025 at 2:36 PM

Skrub

@skrub-data.bsky.social

The sprint was also a big hit, with both new and old contributors working on issues and getting to know the repository.

And to cap it all off, thanks to P16 we have stickers now 🚀

October 7, 2025 at 2:36 PM

Reposted by Skrub

Emilien Schultz

@emilienschultz.bsky.social

What a banger is skrub @skrub-data.bsky.social !

Big thumbs up for the sklearn team & the maintainer of this package

October 1, 2025 at 8:24 AM

Skrub

@skrub-data.bsky.social

🛠️ Main bugfixes
- Fixed the display of DataOp objects in Google Colab cell outputs.
- Fixed the range from which choose_float and choose_int sample values when log=False and n_steps is None.
- The SkrubLearner used to do a prediction on the train set during fit(), this has been fixed.

September 26, 2025 at 8:48 AM

Skrub

@skrub-data.bsky.social

👀 Changes and deprecations
- Ken embeddings are now deprecated.
- The accepted values for the parameter how of .skb.apply() have changed. The new values are "auto", "cols", "frame", and "no_wrap".
- The parameter splitter of .skb.train_test_split() has been renamed split_func.

September 26, 2025 at 8:48 AM

Skrub

@skrub-data.bsky.social

🚀 New features
- The DataOp.skb.full_report() now displays the time each node took to evaluate.
- The User guide has been reworked and expanded.

September 26, 2025 at 8:48 AM

Skrub

@skrub-data.bsky.social

Here's another example on how to tune ML models with skrub Data Ops: skrub-data.org/stable/auto_...

Hyperparameter tuning with DataOps

A machine-learning pipeline typically contains some values or choices which may influence its prediction performance, such as hyperparameters (e.g. the regularization parameter alpha of a RidgeClas...

skrub-data.org

September 12, 2025 at 12:56 PM

Skrub

@skrub-data.bsky.social

The plot in the video was created for our EurosciPy 2025 tutorial on forecasting time series: skrub-data.org/EuroSciPy202...

Skrub DataOps applied to forecasting timeseries — Skrub DataOps applied to forecasting timeseries

skrub-data.org

September 12, 2025 at 12:56 PM

Skrub

@skrub-data.bsky.social

The plot is interactive: you can select a range of results, and it will highlight only the runs within that range, enabling you to refine your search further. It also tracks fit and score times, so you can identify which parameters most impact runtime.

September 12, 2025 at 12:56 PM

Skrub

@skrub-data.bsky.social

skrub-data.org/stable/refer...

SquashingScaler

Gallery examples: SquashingScaler: Robust numerical preprocessing for neural networks

skrub-data.org

September 5, 2025 at 8:47 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news