Sergio Paniego
banner
sergiopaniego.bsky.social
Sergio Paniego
@sergiopaniego.bsky.social
AI PhD. Technology enables us to be more human. 🏳️‍🌈
🧠 Following Hugging Face's blog on scaling test-time compute with open models—letting models "think longer," inspired by OpenAI & DeepMind—I created a recipe to extend inference time for Instruct LLMs, tackling harder tasks like complex math problems.

Links below 👇
January 7, 2025 at 10:34 AM
I’m a big fan of smol models—compact, efficient, and perfect for inference/training on limited resources. Even better when they’re multimodal! 🤏✨

I explored fine-tuning SmolVLM, a multimodal smol model using TRL with SFT and DPO, creating 2 hands-on projects!

🔗Links below👇
December 18, 2024 at 8:23 AM
💡I've been exploring how to go smol with multimodal RAG.

I've created a project using SmolVLM and ColSmolVLM to create a multimodal RAG that can run on Colab's free tier.

Featuring:
🤏👀 SmolVLM (VLM)
🤏📚ColQwen2 (Doc Retrieval)
⚙️ Runs in Colab's free-tier GPU

Link below
December 16, 2024 at 5:23 PM
💡 New Multimodal RAG Recipe with Re-Ranking 💡

I explored how to enhance a multimodal RAG pipeline by integrating a re-ranker!

Featuring:
✨ Qwen2-VL-7B (VLM)
📚 ColQwen2 (Doc Retrieval)
🔍 MonoQwen2 (Re-ranking)
🔥 Optimized for consumer GPUs with quantized VLMs.

Link below:
December 12, 2024 at 5:29 PM
Reposted by Sergio Paniego
Learn how to build a complete multimodal RAG pipeline with
ColQwen2 as retriever, MonoQwen2-VL as reranker, Qwen2-VL as VLM in this notebook that runs on a GPU as small as L4 🔥 huggingface.co/learn/cookbo...
December 12, 2024 at 2:31 PM
✨ Gave a talk on autonomous driving today to undergrad students! We covered everything from definitions to real-world examples, plus cutting-edge concepts like Generative World Models and Vision-Language Models (VLMs). Exciting future ahead! 🚗💡
December 3, 2024 at 5:12 PM
This is such a cool project, and it was a truly exciting experience to contribute to it!! 😀
December 2, 2024 at 11:47 AM
Reposted by Sergio Paniego
We took those TRL notebooks from last week and made a page from them. So if you're upskilling on finetuning or aligning LLMs, and want examples from the community (like Maxime Labonne Philipp Schmid Sergio Paniego Blanco), check it out!

bsky.app/profile/benb...

>> huggingface.co/docs/trl/mai...
December 2, 2024 at 9:18 AM
I've been exploring the latest Llama 3.2 releases and working on a couple of projects you may find interesting:

1️⃣ Understanding tool calling with Llama 3.2 🔧
2️⃣ Using Text Generation Inference (TGI) with Llama models 🦙

(links in the next post)
November 29, 2024 at 10:10 AM
💡 A few days ago, I came across a fascinating post about Agentic RAG by Erika Cardenas and Leonie Monigatti, and it inspired me to dive into the concept and bring it to life in code!
November 27, 2024 at 5:26 PM
Reposted by Sergio Paniego
4/6 More vision skills for complex visual tasks. This tutorial shows how to fine-tune the Qwen2-VL-7B model for visual question answering using the ChartQA dataset.

huggingface.co/learn/cookbo...

by @sergiopaniego.bsky.social
Fine-Tuning a Vision Language Model (Qwen2-VL-7B) with the Hugging Face Ecosystem (TRL) - Hugging Face Open-Source AI Cookbook
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
November 25, 2024 at 10:16 AM
Reposted by Sergio Paniego
TRL is a cornerstone of LLM post training and imo it's the default to learn.

There are great alternatives like Unsloth, Axolotl, and AutoTrain. But if you want a daily drive that does experimentation to production, it's TRL.

🧵 these community notebooks guide you through TRL's core:
November 25, 2024 at 10:16 AM
hola 👋 hi 👋
November 23, 2024 at 6:07 PM