Lightnews — Scholar-powered news

Yuki Asano @yukimasano.bsky.social · 1d

On the occasion of the 1000th citation of our Sinkhorn-Knopp self-supervised representation learning paper, I've written a whole post about the history and the key bits of this method that powers the state-of-the-art SSL vision models.

Read it here :): docs.google.com/document/d/1...

1 4 16

Yuki Asano @yukimasano.bsky.social · Jul 21

Today, we release Franca, a new vision Foundation Model that matches and often outperforms DINOv2.
The data, the training code and the model weights are open-source.

This is the result of a close and fun collaboration
@valeoai.bsky.social (in France) and @funailab.bsky.social (in Franconia)🚀

Andrei Bursuc @abursuc.bsky.social · Jul 21

1/ Can open-data models beat DINOv2? Today we release Franca, a fully open-sourced vision foundation model. Franca with ViT-G backbone matches (and often beats) proprietary models like SigLIPv2, CLIP, DINOv2 on various benchmarks setting a new standard for open-source research.

4 21

Yuki Asano @yukimasano.bsky.social · Dec 23

Agreed, very interesting! Future engines that run on information? 🤯

wellingmax.bsky.social @wellingmax.bsky.social · Dec 22

www.quantamagazine.org/what-is-entr...

What Is Entropy? A Measure of Just How Little We Really Know. | Quanta Magazine

Exactly 200 years ago, a French engineer introduced an idea that would quantify the universe’s inexorable slide into decay. But entropy, as it’s currently understood, is less a fact about the world th...

www.quantamagazine.org

7

Yuki Asano @yukimasano.bsky.social · Dec 10

Our Lab is now also on bsky! 🥳

Fundamental AI Lab @funailab.bsky.social · Dec 10

Hello world!
We're the Fundamental AI Lab, lead by @yukimasano.bsky.social at the UTN in Nuremberg.

We research computer vision, multimodal learning and adapting Foundation Models! Follow us :)

2 1 21

Reposted by Yuki Asano

Andreas Steiner @andreaspsteiner.bsky.social · Dec 5

🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes.

1/7

1 21 68

Reposted by Yuki Asano

Dima Damen @dimadamen.bsky.social · Dec 4

Pls RT
Permanent Assistant Professor (Lecturer) position in Computer Vision @bristoluni.bsky.social [DL 6 Jan 2025]
This is a research+teaching permanent post within MaVi group uob-mavi.github.io in Computer Science. Suitable for strong postdocs or exceptional PhD graduates.
t.co/k7sRRyfx9o
1/2

https://tinyurl.com/BristolCVLectureship

t.co

1 14 23

Yuki Asano @yukimasano.bsky.social · Dec 2

Today we had a joint workshop between our FunAI Lab, UTN and AIST Japan. 13 talks, 1 cake and lots of Bavarian food really get research discussions going!
Towards more collaborations in AI between 🇩🇪 & 🇯🇵.
@hirokatukataoka.bsky.social

1 6

Yuki Asano @yukimasano.bsky.social · Nov 28

Thanks for tagging. In addition have a look at NV-Embed paper: arxiv.org/abs/2405.17428 they do contrastive finetuning after turning on the bidirectional attention mask

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Decoder-only large language model (LLM)-based embedding models are beginning to outperform BERT or T5-based embedding models in general-purpose text embedding tasks, including dense vector-based retri...

arxiv.org

1 3

Yuki Asano @yukimasano.bsky.social · Nov 27

Also @phdcomics.bsky.social is on 🦋 👏. slowly nesting here.

PHD Comics @phdcomics.com · Sep 18

Marriage vs PhD

4

Yuki Asano @yukimasano.bsky.social · Nov 27

Yay, @xkcd.com is on 🦋

Randall Munroe @xkcd.com · Nov 27

Cold Air xkcd.com/3016

2

Yuki Asano @yukimasano.bsky.social · Nov 26

Nice 👏! We love small (M)LLMs :) will training code also be released?

Thomas Wolf @thomwolf.bsky.social · Nov 26

Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos.

Outperforms all models at similar GPU RAM usage and tokens throughputs

Blog post: huggingface.co/blog/smolvlm

1 3

Yuki Asano @yukimasano.bsky.social · Nov 26

and also perhaps interesting for you: probing text-representations of LLMs for CLIP-like zero-shot classification: arxiv.org/abs/2410.07173

Do better language models have crisper vision?

How well do text-only Large Language Models (LLMs) grasp the visual world? As LLMs are increasingly used in computer vision, addressing this question becomes both fundamental and pertinent. However, e...

arxiv.org

3

Reposted by Yuki Asano

Nanne van Noord @nanne.bsky.social · Nov 26

Sam next to his poster; I'm still very impressed he did all this for his MSc thesis! #BMVC2024

1 5

Yuki Asano @yukimasano.bsky.social · Nov 26

exactly. hence the new post-(pre)training term perhaps? post-training seems to be a good generic term for the RLHF/preference tuning etc in NLP allenai.org/papers/tulu-.... so by saying post-pretraining, we could emphasize the fact it's unsupervised

allenai.org

1

Yuki Asano @yukimasano.bsky.social · Nov 26

"Post-pretraining", "unsupervised domain adaptation" fits, but I think is used for different tasks

2 2

Yuki Asano @yukimasano.bsky.social · Nov 26

This work was led by Jochem Loedeman in his MSc, and supervised by Maarten Stol, Tengda Han and myself.
📓: arxiv.org/abs/2210.06466 
Visit BMVC poster 532 at 10am today!

Prompt Generation Networks for Input-Space Adaptation of Frozen Vision Transformers

With the introduction of the transformer architecture in computer vision, increasing model scale has been demonstrated as a clear path to achieving performance and robustness gains. However, with mode...

arxiv.org

1 8

Yuki Asano @yukimasano.bsky.social · Nov 26

This means we can simply send an adapted RGB image to the server to get a personalised output.
We also show that the gains don't just come from adding a new learnable model, but instead from the interplay between the pretrained one and the PGN.

1 2

Yuki Asano @yukimasano.bsky.social · Nov 26

This CNN (e.g. running on a phone) outputs a softmax over a set of learned tokens. These are then combined and used for the adaptation. This allows efficient learning, but also for moving the signal back into pixel-space via pseudo-inverse.

1 1

Yuki Asano @yukimasano.bsky.social · Nov 26

Also known as reprogramming, works from @phillipisola.bsky.social showed that even adjusting singular pixels allows adapting a model. We take this one step further and make the input-only adaptation signal dependent on the image itself: We introduce a lightweight CNN, the Prompt Generation Network.

2 4

Yuki Asano @yukimasano.bsky.social · Nov 26

LoRA is great but one disadvantage is that if you have 1000s of these adapters and want to serve them in an efficient way, it's very difficult: GPUs are inefficient when you e.g. use one adapter for only one sample in a large batch. The solution is to adapt the model strictly in input-space.

1 3

Yuki Asano @yukimasano.bsky.social · Nov 26

LoRA et al. enable personalised model generation and serving, which is crucial as finetuned models still outperform general ones in many tasks. However, serving a base model with many LoRAs is very inefficient! Now, there's a better way: enter Prompt Generation Networks, presented today #BMVC

1 5 31

Yuki Asano @yukimasano.bsky.social · Nov 20

Hello world!
Is there any tool to sync twitter and bluesky posting?

1 3

Reposted by Yuki Asano

Kosta Derpanis @csprofkgd.bsky.social · Nov 19

My growing list of #computervision researchers on Bsky.

Missed you? Let me know.

go.bsky.app/M7HGC3Y

88 42 130

Reposted by Yuki Asano

Jeremy Howard @howard.fm · Nov 19

The thingie that brings over your twitter followers worked jolly well for me. Very cool! I am following another 500 people now thanks to that…
chromewebstore.google.com/detail/sky-f...

Sky Follower Bridge - Chrome Web Store

Instantly find and follow the same users from your Twitter follows on Bluesky.

chromewebstore.google.com

13 11 150