Soledad Galli, PhD
banner
solegalli.bsky.social
Soledad Galli, PhD
@solegalli.bsky.social
Data scientist, best selling instructor, book author, Python 🐍 open-source developer (check out Feature-engine).

Find out more at Train in Data: https://www.trainindata.com/
A few years back BORUTA was all over the web and data science competition forums.

Since then... silence... is it really dead?

I did some research, and this is what I found out:
www.blog.trainindata.com/is-boruta-de...
Is Boruta dead? - Train in Data's Blog
The most exhaustive discussion on boruta in machine learning. Learn what it is, advantages and limitations, and its Python implementation.
www.blog.trainindata.com
January 12, 2026 at 12:30 PM
New payment method rolled out for all our courses!

You can now pay in your own currency* and avoid hidden bank or country specific fees.

We look forward to seeing you on our courses.

*Atm only 20 currencies are supported.

champ.ly/6WkK6AA3
January 5, 2026 at 5:45 PM
Should you use imbalanced-learn in 2025?

SMOTE, oversampling and undersampling have been proposed as the power horses to tackle imbalanced data.

But do they really work?

We talk about that in this article.
www.blog.trainindata.com/should-you-u...
Should You Use Imbalanced-Learn in 2025? - Train in Data's Blog
I discuss the latest evidence on the use of undersampling and SMOTE for imbalanced data and whether the Python library is still useful.
www.blog.trainindata.com
December 3, 2025 at 12:30 PM
Moving averages has been long used as a forecasting benchmark model.

Did you know that you can also use moving averages as input features?

If not, check out this blog to find out more, together with Python implementations:

www.blog.trainindata.com/master-movin...
Moving Average Forecasting: What You Need to Know - Train in Data's Blog
Learn moving average forecasting with clear examples, practical applications, and accuracy tips for better time series predictions.
www.blog.trainindata.com
November 3, 2025 at 12:30 PM
Discover the latest thoughts on working with imbalanced data with our free booklet.

We discuss 3 recent articles that have changed the conversation on resampling and SMOTE👇

www.trainindata.com/p/7-takes-on...
October 27, 2025 at 12:30 PM
All our courses come with a 30-Day money back guarantee...

If you are unhappy for whatever reason, we give you the money back.

That confident we are that you'll ❤️ our courses.

#trainindata
October 24, 2025 at 11:28 PM
Next Monday on Data Bites : Six Cloud Platforms to Run Jupyter Notebooks for Free 🚀

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/bltkmoeitj

#machinelearning #datascience #jupyter #mlmodels #ML #mltools #notebooks #cloudplatforms
August 29, 2025 at 10:02 AM
Imbalanced datasets can mess with your ML models. 😬
ADASYN (Adaptive Synthetic Sampling) to the rescue! 🚀

Learn how it works + when to use it in our latest blog 👇
https://f.mtr.cool/rqstrumpnx

#MachineLearning #DataScience #ImbalancedData #ADASYN
ADASYN: Adaptive Synthetic Sampling for Imbalanced Datasets - Train in Data's Blog
ADASYN can be used to handle data imbalance by creating synthetic samples of the minority class and improve model performance. Really?
f.mtr.cool
August 28, 2025 at 4:02 PM
👉MICE is a powerful method for datasets with missing data across multiple variables. 

Let this slide guide you through how it works. 

#machinelearning #MICE #mlmodels #datascience #dataengineering #imputation #featureengineering
August 27, 2025 at 4:02 PM
How to construct ensembles from a thousand models?

In this article, Caruana, a prominent figure in machine learning and ensemble methods, tells us more about how they create ensembles from libraries of 1000s of machine learning models. 
📄 https://f.mtr.cool/fpaqqnqxms
August 26, 2025 at 4:02 PM
Clustering & Dimensionality Reduction: your toolkit for finding patterns, simplifying data, and solving real-world problems.

🔍 You’ll:
✅ Group data (K-means, DBSCAN & more)
✅ Reduce complexity (PCA, UMAP)
✅ Work on real cases like RNA profiling

📍 https://f.mtr.cool/hdjiwbbsbl
August 25, 2025 at 4:02 PM
Next Monday on Data Bites : Working with imbalanced data? Follow these 3 steps.

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/svpfklfpda

#machinelearning #datascience #CV #mlmodels #ML #MLCareer #MLresume
August 22, 2025 at 10:02 AM
Model performance matters! 🎯 

In this article, we break down essential evaluation metrics for classification models, starting with the Confusion Matrix. Perfect for anyone looking to build reliable #machinelearning systems!

Have a good read👇
Confusion Matrix, Precision, and Recall - Train in Data's Blog
Find out what the confusion matrix is and how it relates to other classification metrics like precision, recall and f1-score.
f.mtr.cool
August 21, 2025 at 4:02 PM
ELI5 now supports scikit-learn 1.6.0! 🎉It wasn’t working with the latest version of scikit-learn, but that’s a thing of the past.

As of now, ELI5 has released a new version with full support for scikit-learn >1.6.0 and Python >3.10.

Check it out 👇
GitHub - eli5-org/eli5: A library for debugging/inspecting machine learning classifiers and explaining their predictions
A library for debugging/inspecting machine learning classifiers and explaining their predictions - eli5-org/eli5
f.mtr.cool
August 20, 2025 at 4:02 PM
Can we use statistical tests to select features? 🤔

Turns out, we can! 🎉

In the slides below, we’ll explore the most commonly used statistical tests for feature selection, along with their advantages and limitations. 👇

#machinelearning #datascience #featureselection
August 19, 2025 at 4:02 PM
🚨 It’s here! Our new course on Clustering & Dimensionality Reduction just dropped 🎉

Learn how to group data (K-Means, DBSCAN, Louvain) + simplify it with PCA & UMAP, no prior experience needed!

Hands-on & practical 👇
👉  https://f.mtr.cool/zshxexbrds

#MachineLearning #DataScience
August 18, 2025 at 4:02 PM
Next Monday on Data Bites : How to Write a Winning Data Science CV

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/nozrfuruar

#machinelearning #datascience #CV #mlmodels #ML #MLCareer #MLresume
August 15, 2025 at 10:02 AM
Deep learning has transformed our daily lives, but designing neural networks remains a challenge. 

Automated hyperparameter optimization (HPO) streamlines the process. This paper reviews key techniques & tools for improving model accuracy & efficiency.
📃https://f.mtr.cool/wowjcrmwjg
August 14, 2025 at 4:02 PM
August 13, 2025 at 4:02 PM
🚨 SMOTE has long been hailed as the go-to solution for imbalanced datasets, but it only works in specific scenarios. 

In this article, we explore when SMOTE is truly effective & why it’s remained popular. 

Check it out!
https://f.mtr.cool/medbbpfril
August 12, 2025 at 4:01 PM
🚨 Just launched: our new course on Clustering & Dimensionality Reduction is live at Train in Data!

Learn to group data, reduce complexity with PCA & UMAP, and tackle real-world projects (no experience needed!)

🎓 Join us: https://f.mtr.cool/wlhxbboqkl
August 11, 2025 at 4:02 PM
Next Monday on Data Bites : Everybody says “SMOTE does not work”.

Want to know more?

Click the link below to subscribe and stay tuned!👇
https://f.mtr.cool/pinchbaedf

#machinelearning #datascience #smote #mlmodels #ML
August 8, 2025 at 10:01 AM
In this video, I review hyperparameter optimization techniques like Grid Search, Random Search, & Bayesian methods.

Learn their pros, cons, and best applications for both low and high-dimensional spaces! 

What techniques do you use? 
📽️
Enjoy the videos and music that you love, upload original content and share it all with friends, family and the world on YouTube.
f.mtr.cool
August 7, 2025 at 4:02 PM
🐍Python libraries that implement agnostic global explainability methods 👇 

#python #machinelearning #MLModel #datascience #dataengineering
August 6, 2025 at 4:02 PM