Lightnews — Scholar-powered news

Yuki Asano

@yukimasano.bsky.social

Thanks for tagging. In addition have a look at NV-Embed paper: arxiv.org/abs/2405.17428 they do contrastive finetuning after turning on the bidirectional attention mask

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Decoder-only large language model (LLM)-based embedding models are beginning to outperform BERT or T5-based embedding models in general-purpose text embedding tasks, including dense vector-based retri...

arxiv.org

November 28, 2024 at 5:53 PM

Yuki Asano

@yukimasano.bsky.social

and also perhaps interesting for you: probing text-representations of LLMs for CLIP-like zero-shot classification: arxiv.org/abs/2410.07173

Do better language models have crisper vision?

How well do text-only Large Language Models (LLMs) grasp the visual world? As LLMs are increasingly used in computer vision, addressing this question becomes both fundamental and pertinent. However, e...

arxiv.org

November 26, 2024 at 1:21 PM

Reposted by Yuki Asano

Nanne van Noord

@nanne.bsky.social

Sam next to his poster; I'm still very impressed he did all this for his MSc thesis! #BMVC2024

November 26, 2024 at 10:25 AM

Yuki Asano

@yukimasano.bsky.social

exactly. hence the new post-(pre)training term perhaps? post-training seems to be a good generic term for the RLHF/preference tuning etc in NLP allenai.org/papers/tulu-.... so by saying post-pretraining, we could emphasize the fact it's unsupervised

allenai.org

November 26, 2024 at 8:30 AM

Yuki Asano

@yukimasano.bsky.social

"Post-pretraining", "unsupervised domain adaptation" fits, but I think is used for different tasks

November 26, 2024 at 8:01 AM

Yuki Asano

@yukimasano.bsky.social

This work was led by Jochem Loedeman in his MSc, and supervised by Maarten Stol, Tengda Han and myself.
📓: arxiv.org/abs/2210.06466 
Visit BMVC poster 532 at 10am today!

Prompt Generation Networks for Input-Space Adaptation of Frozen Vision Transformers

With the introduction of the transformer architecture in computer vision, increasing model scale has been demonstrated as a clear path to achieving performance and robustness gains. However, with mode...

arxiv.org

November 26, 2024 at 7:28 AM

Yuki Asano

@yukimasano.bsky.social

This means we can simply send an adapted RGB image to the server to get a personalised output.
We also show that the gains don't just come from adding a new learnable model, but instead from the interplay between the pretrained one and the PGN.

November 26, 2024 at 7:28 AM

Yuki Asano

@yukimasano.bsky.social

This CNN (e.g. running on a phone) outputs a softmax over a set of learned tokens. These are then combined and used for the adaptation. This allows efficient learning, but also for moving the signal back into pixel-space via pseudo-inverse.

November 26, 2024 at 7:28 AM

Yuki Asano

@yukimasano.bsky.social

Also known as reprogramming, works from @phillipisola.bsky.social showed that even adjusting singular pixels allows adapting a model. We take this one step further and make the input-only adaptation signal dependent on the image itself: We introduce a lightweight CNN, the Prompt Generation Network.

November 26, 2024 at 7:28 AM

Yuki Asano

@yukimasano.bsky.social

LoRA is great but one disadvantage is that if you have 1000s of these adapters and want to serve them in an efficient way, it's very difficult: GPUs are inefficient when you e.g. use one adapter for only one sample in a large batch. The solution is to adapt the model strictly in input-space.

November 26, 2024 at 7:28 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news