Lightnews — Scholar-powered news

merve

@merve.bsky.social

llama.cpp has vision language model support now! ❤️‍🔥

get started with sota VLMs (gemma 3, Qwen2.5VL, InternVL3 & more) and serve them wherever you want 🤩
learn more github.com/ggml-org/lla... 📖

May 11, 2025 at 7:46 AM

merve

@merve.bsky.social

If you want to ✨ speed-up & harden ✨ your RAG pipelines, use visual document retrieval models ⬇️

We have shipped a how-to guide for VDR models in Hugging Face transformers 🤗📖 huggingface.co/docs/transfo...

May 2, 2025 at 9:49 AM

merve

@merve.bsky.social

Why do people sleep on DSE multimodal retrieval models? 👀

They're just like ColPali, but highly scalable, fast and you can even make them more efficient with binarization or matryoshka with little degradation 🪆⚡️

I collected some here huggingface.co/collections/...

April 15, 2025 at 4:26 PM

merve

@merve.bsky.social

I'm so hooked on @hf.co Inference Providers (specifically Qwen2.5-VL-72B) for multimodal agentic workflows with smolagents 🥹

get started ⤵️
> filter models provided by different providers
> test them through widget or Python/JS/cURL

April 15, 2025 at 2:59 PM

merve

@merve.bsky.social

my weekly summary on what's released in open AI is up on @hf.co huggingface.co/posts/merve/...

collection is here huggingface.co/collections/...

April 14, 2025 at 12:24 PM

merve

@merve.bsky.social

fan-favorite open-source PDF rendering model OlmOCR goes faster and more efficient ⚡️

RolmOCR-7B follows same recipe with OlmOCR, builds on Qwen2.5VL with training set modifications and improves accuracy & performance 🤝

huggingface.co/reducto/Rolm...

April 14, 2025 at 8:51 AM

merve

@merve.bsky.social

the model also has impressive OCR capabilities ⬇️

April 11, 2025 at 7:10 PM

merve

@merve.bsky.social

we'll give this model a test on agentic capabilities but here's an example from paper:

April 11, 2025 at 7:09 PM

merve

@merve.bsky.social

This model consists of a dynamic res handling MoonViT encoder, a projection layer and a 16B MoE decoder (with 2.8B active params)

the paper introduces an interesting pre-training pipeline to handle long context and the model saw 4.4T tokens arxiv.org/pdf/2504.07491

April 11, 2025 at 7:08 PM

merve

@merve.bsky.social

DO NOT SLEEP ON THIS MODEL

Kimi-VL-A3B-Thinking is the first ever capable open-source reasoning VLM with MIT license ❤️
> it has only 2.8B activated params 👏
> it's agentic 🔥 works on GUIs
> surpasses gpt-4o

I've put it to test (see below ⤵️) huggingface.co/spaces/moons...

April 11, 2025 at 7:08 PM

merve

@merve.bsky.social

InternVL3 is out 💥

> 7 ckpts with various sizes (1B to 78B)
> Built on InternViT encoder and Qwen2.5VL decoder, improves on Qwen2.5VL
> Can do reasoning, document tasks, extending to tool use and agentic capabilities 🤖
> easily use with Hugging Face transformers 🤗 huggingface.co/collections/...

April 11, 2025 at 1:35 PM

merve

@merve.bsky.social

All the multimodal document retrieval models (ColPali, DSE et al) are now under visual document retrieval at @hf.co 📝🤗

take your favorite VDR model out for multimodal RAG 🤝

February 26, 2025 at 11:39 AM

merve

@merve.bsky.social

Everything that was released passed week in open AI 🤠

> Link to all models, datasets, demos huggingface.co/collections/...
> Text-readable version is here huggingface.co/posts/merve/...

January 17, 2025 at 3:28 PM

merve

@merve.bsky.social

there's a new multimodal retrieval model in town 🤠
@llamaindex.bsky.social released vdr-2b-multi-v1
> uses 70% less image tokens, yet outperforming other dse-qwen2 based models
> 3x faster inference with less VRAM 💨
> shrinkable with matryoshka 🪆
huggingface.co/collections/...

January 13, 2025 at 11:11 AM

merve

@merve.bsky.social

What a week to open the year in open ML, all the things released at @hf.co 🤠

Here's everything released, find text-readable version here huggingface.co/posts/merve/...

All models are here huggingface.co/collections/...

January 10, 2025 at 2:51 PM

merve

@merve.bsky.social

ViTPose -- best open-source pose estimation model just landed to @hf.co transformers 🕺🏻💃🏻

🔖 Model collection: huggingface.co/collections/...

🔖 Notebook on how to use: colab.research.google.com/drive/1e8fcb...

🔖 Try it here: huggingface.co/spaces/hysts...

January 9, 2025 at 2:27 PM

merve

@merve.bsky.social

The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM 💬

the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks ⤵️

January 9, 2025 at 12:00 PM

merve

@merve.bsky.social

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗

The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️

January 9, 2025 at 12:00 PM

merve

@merve.bsky.social

see the blog and our docs for more insights around native agentic skills of LLMs and getting started with smolagents, courtesy of the amazing
@m--ric.bsky.social

> Blog: hf.co/blog/smolage...
> Quickstart: huggingface.co/docs/smolage...

December 31, 2024 at 3:39 PM

merve

@merve.bsky.social

you can still do traditional tool calling where you can do tool calling with JSON

writing a tool and using it is very easy, just decorate the function with `@tool`

what's cooler is that you can push and pull tools from Hugging Face Hub! see below

December 31, 2024 at 3:39 PM

merve

@merve.bsky.social

It is very easy to use CodeAgent!

Just initialize it with the tool of your choice and the model of your choice

See below how you can get started, you can use the models with HF Inference API as well as locally!

December 31, 2024 at 3:39 PM

merve

@merve.bsky.social

smolagents is a barebones library to unlock both native and traditional tool calling for language models

LLMs can already write code and do reasoning, so why bother yourself with writing the tool?

CodeAgent class is here for it! see it in action below

December 31, 2024 at 3:39 PM

merve

@merve.bsky.social

supercharge your LLM apps with smolagents 🔥

however cool your LLM is, without being agentic it can only go so far

enter smolagents: a new agent library by @hf.co to make the LLM write code, do analysis and automate boring stuff! huggingface.co/blog/smolage...

thumbnail that says introducing smolagents

December 31, 2024 at 3:32 PM

merve

@merve.bsky.social

ColPali is landed at @hf.co transformers and I have just shipped a very lean fine-tuning tutorial in smol-vision 🤠💗

QLoRA fine-tuning with 4-bit with bsz of 4 can be done with 32 GB VRAM and is very fast! ✨
github.com/merveenoyan/...

screenshot of the top of the tutorial that says "Fine-tune ColPali for Multimodal RAG"

December 20, 2024 at 3:53 PM

merve

@merve.bsky.social

you can now stay up-to-date with big AI research labs' updates on @hf.co easily over org activity page 🥹

I have been looking forward to this feature as I felt most back to back releases are overwhelming and I tend to miss out 🤠

December 20, 2024 at 1:09 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news