Lightnews — Scholar-powered news

Reposted by Duc Nguyen Huu

Kevin Markham

@dataschool.io

Dream unlocked: I'm publishing my first book! 🎉🎉🎉

It's called "Master Machine Learning with scikit-learn: A Practical Guide to Building Better Models with Python"

Download the first 3 chapters right now:
👉 dataschool.kit.com/mlbook 👈

Thanks for your support 🙏

September 11, 2025 at 5:53 PM

Reposted by Duc Nguyen Huu

David Holzmüller

@dholzmueller.bsky.social

I got 3rd out of 691 in a tabular kaggle competition – with only neural networks! 🥉

My solution is short (48 LOC) and relatively general-purpose – I used skrub to preprocess string and date columns, and pytabkit to create an ensemble of RealMLP and TabM models. Link below👇

July 29, 2025 at 11:10 AM

Reposted by Duc Nguyen Huu

Gaël Varoquaux

@gaelvaroquaux.bsky.social

This work is presented at ICML next week.
• The paper arxiv.org/html/2502.05...
• The python package: pypistats.org/packages/tab... (try it out 🐍)
• The source code github.com/soda-inria/t... (100% open source, including pre-training 💞)

Longer read (5mn): gael-varoquaux.info/science/tabi...
8/9

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

arxiv.org

July 9, 2025 at 6:42 PM

Reposted by Duc Nguyen Huu

Gaël Varoquaux

@gaelvaroquaux.bsky.social

👨‍🎓🧾✨#icml2025 Paper: TabICL, A Tabular Foundation Model for In-Context Learning on Large Data
With Jingang Qu, @dholzmueller.bsky.social, and Marine Le Morvan

TL;DR: a well-designed architecture and pretraining gives best tabular learner, and more scalable
On top, it's 100% open source
1/9

July 9, 2025 at 6:42 PM

Reposted by Duc Nguyen Huu

Kevin Markham

@dataschool.io

My thoughts on the current state of AI progress and the most important developments in 2025:

www.dataschool.io/ai-progress-...

AI progress in 2025 📈

Thoughts on the current state of AI progress and the most important developments in 2025

www.dataschool.io

May 28, 2025 at 2:17 PM

Duc Nguyen Huu

@ducnh279.bsky.social

Sebastian Raschka (rasbt) @sebastianraschka.com · Apr 19

Just shared a new article on "The State of Reinforcement Learning for LLM Reasoning"!
If you are new to reinforcement learning, this article has a generous intro section (PPO, GRPO, etc)
Also, I cover 15 recent articles focused on RL & Reasoning.

🔗 magazine.sebastianraschka.com/p/the-state-...

April 20, 2025 at 12:25 PM

Duc Nguyen Huu

@ducnh279.bsky.social

Are you familiar with Token Pooling?

Models that use late interaction, like ColBERT, ColPali, and ColQwen, gain significant benefits from this pooling technique! By integrating token pooling methods, the number of vectors to store can be reduced.

Blog: www.answer.ai/posts/colber...

A little pooling goes a long way for multi-vector representations – Answer.AI

Practical AI R&D

www.answer.ai

April 4, 2025 at 11:41 PM

Duc Nguyen Huu

@ducnh279.bsky.social

Efficiently scale long CoT models like DeepSeek when using Best-of-N or Majority Voting by early pruning reasoning chains.

Kaggle Discussion: www.kaggle.com/competitions...

AI Mathematical Olympiad - Progress Prize 2

Solve national-level math challenges using artificial intelligence models

www.kaggle.com

April 4, 2025 at 7:48 PM

Duc Nguyen Huu

@ducnh279.bsky.social

I find making your agents safe is just as important as making them smart. 🔒

A good read for building secure AI!

arxiv.org/pdf/2503.18813

March 31, 2025 at 12:47 PM

Duc Nguyen Huu

@ducnh279.bsky.social

There will be one day ... in 🇺🇸 or 🇻🇳

March 30, 2025 at 7:37 PM

Reposted by Duc Nguyen Huu

Kevin Markham

@dataschool.io

Claude finally integrated web search into its results...

But with LangChain & LangGraph, you can build a chatbot that integrates web search into ANY model you like!

You'll learn how to do that (and much more) in my new AI course...

Sign up for EARLY ACCESS:
👉 dataschool.kit.com/agents 👈

March 27, 2025 at 11:59 AM

Duc Nguyen Huu

@ducnh279.bsky.social

A practical way for students to secure jobs and earn money is by developing real-world projects. Researching or engineering LLMs often seems like a field dominated by the big tech!

It's still important to learn fundamentals from scratch for growth and problem-solving (e.g be able to fix things)! 😁

Kevin Markham @dataschool.io · Mar 24

Just finished recording my new AI course 😅

Sign up for early access: dataschool.kit.com/agents

March 24, 2025 at 4:17 PM

Reposted by Duc Nguyen Huu

Sebastian Raschka (rasbt)

@sebastianraschka.com

My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling: www.youtube.com/watch?v=Zar2...

March 23, 2025 at 1:38 PM

Reposted by Duc Nguyen Huu

Rashmi Banthia

@rashmib.bsky.social

cuDF-pandas (%load_ext cudf.pandas) with Rapids ... work similarly and super cool to see we will be able to speed up scikit-learn

March 20, 2025 at 9:59 AM

Duc Nguyen Huu

@ducnh279.bsky.social

Scikit-learn accelerated 🚀

My company has a bunch of unused T4 GPUs because the LLMs are too big for AI teams run exps. Now the data science team finally has a reason to ask for them! 🤣

developer.nvidia.com/blog/nvidia-...

NVIDIA cuML Brings Zero Code Change Acceleration to scikit-learn | NVIDIA Technical Blog

Scikit-learn, the most widely used ML library, is popular for processing tabular data because of its simple API, diversity of algorithms, and compatibility with popular Python libraries such as pandas...

developer.nvidia.com

March 20, 2025 at 7:57 AM

Reposted by Duc Nguyen Huu

Kevin Markham

@dataschool.io

In honor of March Madness 🏀, I've got a new blog post:

www.dataschool.io/pandas-strea...

Learn how to identify & analyze scoring streaks using pandas operations:

- shift()
- cumsum()
- boolean math
- groupby()

How to calculate "scoring streaks" with pandas 🏀

Learn how to identify & analyze consecutive events in your data using advanced DataFrame methods!

www.dataschool.io

March 17, 2025 at 1:53 PM

Duc Nguyen Huu

@ducnh279.bsky.social

Many good advices/best practices for missing value imputation in the paper!

I now have a much deeper appreciation for Data School's course and regard it as the best scikit-learn course.

Master Machine Learning with scikit-learn: courses.dataschool.io/master-machi...

March 18, 2025 at 3:55 PM

Reposted by Duc Nguyen Huu

Kevin Markham

@dataschool.io

"Some people today are discouraging others from learning programming on the grounds AI will automate it. This advice will be seen as some of the worst career advice ever given."

-- Andrew Ng, legendary AI researcher

Source: www.deeplearning.ai/the-batch/is...

DeepSeek-R1 Uncensored, QwQ-32B Puts Reasoning in Smaller Model, and more...

The Batch AI News and Insights: Some people today are discouraging others from learning programming on the grounds AI will automate it.

www.deeplearning.ai

March 13, 2025 at 6:05 PM

Reposted by Duc Nguyen Huu

Gaël Varoquaux

@gaelvaroquaux.bsky.social

A recent talk, fully in a vscode: 100% code on data wrangling for machine learning with @skrub-data.bsky.social
www.youtube.com/watch?v=hdWW...

super powerful to easily assemble production-ready pipelines in easy syntax

The Future of AI & Machine Learning | The Python Exchange February 2025

YouTube video by Don't Use This Code • James Powell

www.youtube.com

March 14, 2025 at 3:15 PM

Reposted by Duc Nguyen Huu

Sebastian Raschka (rasbt)

@sebastianraschka.com

Yesterday, Google released Gemma 3, their latest open-weight LLM. Finally, a new addition to the "Big 5" of open-weight models (Gemma, Llama, DeepSeek, Qwen, and Mistral). I just went through the Gemma 3 report and experimented a bit with the models, and there are plenty of interesting tidbits:

March 13, 2025 at 4:03 PM

Reposted by Duc Nguyen Huu

Sebastian Raschka (rasbt)

@sebastianraschka.com

Just uploaded my "Coding Attention Mechanisms" tutorial. A 2h15m session on coding attention mechanisms to understand how the engine of LLMs works:
self-attention → parameterized self-attention → causal self-attention → multi-head self-attention
www.youtube.com/watch?v=-Ll8...

Build an LLM from Scratch 3: Coding attention mechanisms

YouTube video by Sebastian Raschka

www.youtube.com

March 11, 2025 at 4:10 PM

Reposted by Duc Nguyen Huu

Sebastian Raschka (rasbt)

@sebastianraschka.com

I just shared a new article, "The State of Reasoning Models", where I am exploring 12 new research articles on improving the reasoning capabilities of LLMs (all published after the release of DeepSeek R1): magazine.sebastianraschka.com/p/state-of-l...

Happy reading!

The State of LLM Reasoning Models

Part 1: Inference-Time Compute Scaling Methods

magazine.sebastianraschka.com

March 8, 2025 at 2:37 PM

Reposted by Duc Nguyen Huu

Trey Hunner

@trey.io

A couple months ago @dataschool.io wrote about a tool he uses to chat with different LLM models without paying a monthly subscription to all of them.

The tool is called Typing Mind and I decided to pay $30 for lifetime access. It was well worth it.

Kevin's post 👇
www.dataschool.io/save-money-o...

Use premium AI models for pennies 💰

Learn how to access ChatGPT, Claude, and more for pennies per conversation rather than paying for expensive subscriptions!

www.dataschool.io

March 3, 2025 at 4:18 PM

Reposted by Duc Nguyen Huu

Kevin Markham

@dataschool.io

19 professionals (in a variety of fields) evaluated OpenAI's Deep Research vs Google's Deep Research.

OpenAI was the clear winner 🏆

Neat study by @binarybits.bsky.social, read more here: www.understandingai.org/p/these-expe...

March 4, 2025 at 3:55 PM

Reposted by Duc Nguyen Huu

Sebastian Raschka (rasbt)

@sebastianraschka.com

A new tutorial in my “Build A Large Language Model From Scratch” series is now live (www.youtube.com/watch?v=341R...)
- Tokenizing raw text and converting tokens into token IDs
- Applying byte pair encoding
- Setting up data loaders in PyTorch for efficient training

Build an LLM from Scratch 2: Working with text data

YouTube video by Sebastian Raschka

www.youtube.com

March 2, 2025 at 2:45 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news