Lightnews — Scholar-powered news

merve

@merve.bsky.social

llama.cpp has vision language model support now! ❤️‍🔥

get started with sota VLMs (gemma 3, Qwen2.5VL, InternVL3 & more) and serve them wherever you want 🤩
learn more github.com/ggml-org/lla... 📖

May 11, 2025 at 7:46 AM

merve

@merve.bsky.social

If you want to ✨ speed-up & harden ✨ your RAG pipelines, use visual document retrieval models ⬇️

We have shipped a how-to guide for VDR models in Hugging Face transformers 🤗📖 huggingface.co/docs/transfo...

May 2, 2025 at 9:49 AM

merve

@merve.bsky.social

Why do people sleep on DSE multimodal retrieval models? 👀

They're just like ColPali, but highly scalable, fast and you can even make them more efficient with binarization or matryoshka with little degradation 🪆⚡️

I collected some here huggingface.co/collections/...

April 15, 2025 at 4:26 PM

merve

@merve.bsky.social

I'm so hooked on @hf.co Inference Providers (specifically Qwen2.5-VL-72B) for multimodal agentic workflows with smolagents 🥹

get started ⤵️
> filter models provided by different providers
> test them through widget or Python/JS/cURL

April 15, 2025 at 2:59 PM

merve

@merve.bsky.social

my weekly summary on what's released in open AI is up on @hf.co huggingface.co/posts/merve/...

collection is here huggingface.co/collections/...

April 14, 2025 at 12:24 PM

merve

@merve.bsky.social

fan-favorite open-source PDF rendering model OlmOCR goes faster and more efficient ⚡️

RolmOCR-7B follows same recipe with OlmOCR, builds on Qwen2.5VL with training set modifications and improves accuracy & performance 🤝

huggingface.co/reducto/Rolm...

April 14, 2025 at 8:51 AM

merve

@merve.bsky.social

Hello friends 👋🏼

If visit Turkey this summer, know that millions of Turkish people are doing a boycott, once a week not buying anything and rest of the week only buying necessities

if you have plans, here's a post that summarizes where you should buy stuff from www.instagram.com/share/BADrkS...

Login • Instagram

Welcome back to Instagram. Sign in to check out what your friends, family & interests have been capturing & sharing around the world.

www.instagram.com

April 12, 2025 at 8:05 AM

Reposted by merve

merve

@merve.bsky.social

SmolVLM paper is out and it's packed with great findings on training a good smol vision LM!

Andi summarized them below, give it a read if you want to see more insights 🤠

Andi @andimara.bsky.social · Apr 8

Today, we share the tech report for SmolVLM: Redefining small and efficient multimodal models.
🔥 Explaining how to create a tiny 256M VLM that uses less than 1GB of RAM and outperforms our 80B models from 18 months ago!
huggingface.co/papers/2504....

Paper page - SmolVLM: Redefining small and efficient multimodal models

Join the discussion on this paper page

huggingface.co

April 9, 2025 at 1:38 PM

merve

@merve.bsky.social

DO NOT SLEEP ON THIS MODEL

Kimi-VL-A3B-Thinking is the first ever capable open-source reasoning VLM with MIT license ❤️
> it has only 2.8B activated params 👏
> it's agentic 🔥 works on GUIs
> surpasses gpt-4o

I've put it to test (see below ⤵️) huggingface.co/spaces/moons...

April 11, 2025 at 7:08 PM

merve

@merve.bsky.social

InternVL3 is out 💥

> 7 ckpts with various sizes (1B to 78B)
> Built on InternViT encoder and Qwen2.5VL decoder, improves on Qwen2.5VL
> Can do reasoning, document tasks, extending to tool use and agentic capabilities 🤖
> easily use with Hugging Face transformers 🤗 huggingface.co/collections/...

April 11, 2025 at 1:35 PM

Reposted by merve

Simon Willison

@simonwillison.net

Model Context Protocol has prompt injection security problems
simonwillison.net/2025/Apr/9/m...

Model Context Protocol has prompt injection security problems

As more people start hacking around with implementations of MCP (the Model Context Protocol, a new standard for making tools available to LLM-powered systems) the security implications of tools built ...

simonwillison.net

April 9, 2025 at 1:01 PM

Reposted by merve

jsulz

@jsulz.com

Xet infra now backs 1000s of repos on @hf.co , which means we get to put on our researcher hats and peer into the bytes 👀 🤓

Xet clients chunk files (~64KB) and skip uploads of duplicate content, but what if those chunks are already in _another_ repo? We skip those too.

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

April 9, 2025 at 3:19 PM

merve

@merve.bsky.social

SmolVLM paper is out and it's packed with great findings on training a good smol vision LM!

Andi summarized them below, give it a read if you want to see more insights 🤠

Andi @andimara.bsky.social · Apr 8

Today, we share the tech report for SmolVLM: Redefining small and efficient multimodal models.
🔥 Explaining how to create a tiny 256M VLM that uses less than 1GB of RAM and outperforms our 80B models from 18 months ago!
huggingface.co/papers/2504....

Paper page - SmolVLM: Redefining small and efficient multimodal models

Join the discussion on this paper page

huggingface.co

April 9, 2025 at 1:38 PM

merve

@merve.bsky.social

X'in politikaları sebebiyle işimle alakalı post'ları burada da paylaşıyor olacağım, takip edebilirsiniz 😊

April 6, 2025 at 11:51 AM

merve

@merve.bsky.social

icymi I shipped a tutorial on fine-tuning vision language models on videos ⏯️

learn how to fine-tune SmolVLM2 on Video Feedback dataset 📖 github.com/merveenoyan/...

smol-vision/Fine_tune_SmolVLM2_on_Video.ipynb at main · merveenoyan/smol-vision

Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜 - merveenoyan/smol-vision

github.com

March 6, 2025 at 3:43 PM

merve

@merve.bsky.social

All the multimodal document retrieval models (ColPali, DSE et al) are now under visual document retrieval at @hf.co 📝🤗

take your favorite VDR model out for multimodal RAG 🤝

February 26, 2025 at 11:39 AM

Reposted by merve

Andi

@andimara.bsky.social

Smol but mighty:
• 256M delivers 80% of the performance of our 2.2B model.
• 500M hits 90%.
Both beat our SOTA 80B model from 17 months ago! 🎉

Efficiency 🤝 Performance

Explore the collection here: huggingface.co/collections/...
Blog: huggingface.co/blog/smolervlm

January 23, 2025 at 1:33 PM

Reposted by merve

Andi

@andimara.bsky.social

Introducing the smollest VLMs yet! 🤏
SmolVLM (256M & 500M) runs on <1GB GPU memory.
Fine-tune it on your laptop and run it on your toaster. 🚀
Even the 256M model outperforms our Idefics 80B (Aug '23).
How small can we go? 👀

January 23, 2025 at 1:33 PM

merve

@merve.bsky.social

Everything that was released passed week in open AI 🤠

> Link to all models, datasets, demos huggingface.co/collections/...
> Text-readable version is here huggingface.co/posts/merve/...

January 17, 2025 at 3:28 PM

merve

@merve.bsky.social

there's a new multimodal retrieval model in town 🤠
@llamaindex.bsky.social released vdr-2b-multi-v1
> uses 70% less image tokens, yet outperforming other dse-qwen2 based models
> 3x faster inference with less VRAM 💨
> shrinkable with matryoshka 🪆
huggingface.co/collections/...

January 13, 2025 at 11:11 AM

merve

@merve.bsky.social

What a week to open the year in open ML, all the things released at @hf.co 🤠

Here's everything released, find text-readable version here huggingface.co/posts/merve/...

All models are here huggingface.co/collections/...

January 10, 2025 at 2:51 PM

merve

@merve.bsky.social

ViTPose -- best open-source pose estimation model just landed to @hf.co transformers 🕺🏻💃🏻

🔖 Model collection: huggingface.co/collections/...

🔖 Notebook on how to use: colab.research.google.com/drive/1e8fcb...

🔖 Try it here: huggingface.co/spaces/hysts...

January 9, 2025 at 2:27 PM

merve

@merve.bsky.social

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗

The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️

January 9, 2025 at 12:00 PM

merve

@merve.bsky.social

supercharge your LLM apps with smolagents 🔥

however cool your LLM is, without being agentic it can only go so far

enter smolagents: a new agent library by @hf.co to make the LLM write code, do analysis and automate boring stuff! huggingface.co/blog/smolage...

thumbnail that says introducing smolagents

December 31, 2024 at 3:32 PM

merve

@merve.bsky.social

ColPali is landed at @hf.co transformers and I have just shipped a very lean fine-tuning tutorial in smol-vision 🤠💗

QLoRA fine-tuning with 4-bit with bsz of 4 can be done with 32 GB VRAM and is very fast! ✨
github.com/merveenoyan/...

screenshot of the top of the tutorial that says "Fine-tune ColPali for Multimodal RAG"

December 20, 2024 at 3:53 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news