Agustín Piqueres
plaguss.bsky.social
Agustín Piqueres
@plaguss.bsky.social
ML Engineer @huggingface 🤗 building distilabel
Reposted by Agustín Piqueres
Excited to announce the SFT dataset used for @huggingface.bsky.social SmolLM2!

The dataset for SmolLM2 was created by combining multiple existing datasets and generating new synthetic datasets, including MagPie Ultra v1.0, using distilabel.

Check out the dataset:
huggingface.co/datasets/Hug...
HuggingFaceTB/smoltalk · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
November 21, 2024 at 3:22 PM