Lightnews — Scholar-powered news

Sergio Paniego

@sergiopaniego.bsky.social

🧠 Following Hugging Face's blog on scaling test-time compute with open models—letting models "think longer," inspired by OpenAI & DeepMind—I created a recipe to extend inference time for Instruct LLMs, tackling harder tasks like complex math problems.

Links below 👇

Scaling test-time compute with open models diagram

January 7, 2025 at 10:34 AM

Sergio Paniego

@sergiopaniego.bsky.social

I’m a big fan of smol models—compact, efficient, and perfect for inference/training on limited resources. Even better when they’re multimodal! 🤏✨

I explored fine-tuning SmolVLM, a multimodal smol model using TRL with SFT and DPO, creating 2 hands-on projects!

🔗Links below👇

December 18, 2024 at 8:23 AM

Sergio Paniego

@sergiopaniego.bsky.social

💡I've been exploring how to go smol with multimodal RAG.

I've created a project using SmolVLM and ColSmolVLM to create a multimodal RAG that can run on Colab's free tier.

Featuring:
🤏👀 SmolVLM (VLM)
🤏📚ColQwen2 (Doc Retrieval)
⚙️ Runs in Colab's free-tier GPU

Link below

December 16, 2024 at 5:23 PM

Sergio Paniego

@sergiopaniego.bsky.social

💡 New Multimodal RAG Recipe with Re-Ranking 💡

I explored how to enhance a multimodal RAG pipeline by integrating a re-ranker!

Featuring:
✨ Qwen2-VL-7B (VLM)
📚 ColQwen2 (Doc Retrieval)
🔍 MonoQwen2 (Re-ranking)
🔥 Optimized for consumer GPUs with quantized VLMs.

Link below:

December 12, 2024 at 5:29 PM

Reposted by Sergio Paniego

merve

@merve.bsky.social

Learn how to build a complete multimodal RAG pipeline with
ColQwen2 as retriever, MonoQwen2-VL as reranker, Qwen2-VL as VLM in this notebook that runs on a GPU as small as L4 🔥 huggingface.co/learn/cookbo...

December 12, 2024 at 2:31 PM

Sergio Paniego

@sergiopaniego.bsky.social

✨ Gave a talk on autonomous driving today to undergrad students! We covered everything from definitions to real-world examples, plus cutting-edge concepts like Generative World Models and Vision-Language Models (VLMs). Exciting future ahead! 🚗💡

December 3, 2024 at 5:12 PM

Sergio Paniego

@sergiopaniego.bsky.social

This is such a cool project, and it was a truly exciting experience to contribute to it!! 😀

Johannes @johko.bsky.social · Dec 1

🌟 500! 🌟

Our Community Computer Vision Course Repo just reached 500 stars on GitHub: github.com/johko/comput... 🤩

I'm really proud of all the amazing content people from the community have contributed here and that they still keep on adding very cool and helpful material 💪

GitHub - johko/computer-vision-course: This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/disc...

This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/discord - johko/computer-vision-course

github.com

December 2, 2024 at 11:47 AM

Reposted by Sergio Paniego

Ben Burtenshaw

@benburtenshaw.bsky.social

We took those TRL notebooks from last week and made a page from them. So if you're upskilling on finetuning or aligning LLMs, and want examples from the community (like Maxime Labonne Philipp Schmid Sergio Paniego Blanco), check it out!

bsky.app/profile/benb...

>> huggingface.co/docs/trl/mai...

December 2, 2024 at 9:18 AM

Sergio Paniego

@sergiopaniego.bsky.social

I've been exploring the latest Llama 3.2 releases and working on a couple of projects you may find interesting:

1️⃣ Understanding tool calling with Llama 3.2 🔧
2️⃣ Using Text Generation Inference (TGI) with Llama models 🦙

(links in the next post)

November 29, 2024 at 10:10 AM

Sergio Paniego

@sergiopaniego.bsky.social

💡 A few days ago, I came across a fascinating post about Agentic RAG by Erika Cardenas and Leonie Monigatti, and it inspired me to dive into the concept and bring it to life in code!

November 27, 2024 at 5:26 PM

Reposted by Sergio Paniego

Ben Burtenshaw

@benburtenshaw.bsky.social

4/6 More vision skills for complex visual tasks. This tutorial shows how to fine-tune the Qwen2-VL-7B model for visual question answering using the ChartQA dataset.

huggingface.co/learn/cookbo...

by @sergiopaniego.bsky.social

Fine-Tuning a Vision Language Model (Qwen2-VL-7B) with the Hugging Face Ecosystem (TRL) - Hugging Face Open-Source AI Cookbook

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

November 25, 2024 at 10:16 AM

Reposted by Sergio Paniego

Ben Burtenshaw

@benburtenshaw.bsky.social

TRL is a cornerstone of LLM post training and imo it's the default to learn.

There are great alternatives like Unsloth, Axolotl, and AutoTrain. But if you want a daily drive that does experimentation to production, it's TRL.

🧵 these community notebooks guide you through TRL's core:

November 25, 2024 at 10:16 AM

Sergio Paniego

@sergiopaniego.bsky.social

hola 👋 hi 👋

November 23, 2024 at 6:07 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news