Lightnews — Scholar-powered news

Giwon Hong

@giwonhong.bsky.social

520 followers 600 following 9 posts

PhD student in ILCC (NLP) program at the University of Edinburgh

Posts Replies Media Videos

Giwon Hong

@giwonhong.bsky.social

Work done with: @pminervini.bsky.social @edoardo-ponti.bsky.social @emilevankrieken.com and Nikolay Malkin

Paper: arxiv.org/abs/2411.02830
(🧵8/n)

Mixtures of In-Context Learners

In-context learning (ICL) adapts LLMs by providing demonstrations without fine-tuning the model parameters; however, it does not differentiate between demonstrations and quadratically increases the co...

arxiv.org

November 18, 2024 at 6:42 PM

Giwon Hong

@giwonhong.bsky.social

🔍 Conclusion: 𝗠𝗼𝗜𝗖𝗟 offers a robust, efficient approach for combining demonstrations (experts), significantly boosting accuracy over baselines. 𝗠𝗼𝗜𝗖𝗟 is also resilient to low-quality demonstrations and achieves improved data and computational efficiency. (🧵7/n)

November 18, 2024 at 6:38 PM

Giwon Hong

@giwonhong.bsky.social

⚙️ Data and Compute Efficiency of 𝗠𝗼𝗜𝗖𝗟: We find that 𝗠𝗼𝗜𝗖𝗟 is more efficient in terms of data and computation compared to conventional (concat-based) ICL! (🧵6/n)

November 18, 2024 at 6:38 PM

Giwon Hong

@giwonhong.bsky.social

📉 Noisy and Imbalanced Demonstrations: By assigning weights to each demonstration subset, 𝗠𝗼𝗜𝗖𝗟 can effectively handle various practical applications where data quality varies. (🧵5/n)

November 18, 2024 at 6:38 PM

Giwon Hong

@giwonhong.bsky.social

🌐Generalization to Unseen Demonstrations: 𝙨𝙘𝙖𝙡𝙖𝙧 weights require predefined demonstration subsets.
Using 𝙃𝙮𝙥𝙚𝙧-𝙣𝙚𝙩𝙬𝙤𝙧𝙠—a smaller fine-tuned hyper-network that dynamically generates weights for each expert based on all concatenated demonstration subsets. (🧵4/n)

November 18, 2024 at 6:37 PM

Giwon Hong

@giwonhong.bsky.social

📊 𝗠𝗼𝗜𝗖𝗟 in Classification Tasks: 𝗠𝗼𝗜𝗖𝗟 outperformed Baseline ICL on 5 out of 7 datasets!
Using 𝙨𝙘𝙖𝙡𝙖𝙧 weights—a vector of trainable parameters that assign each expert a weight—we fine-tuned how demonstration subsets are combined. (🧵3/n)

November 18, 2024 at 6:37 PM

Giwon Hong

@giwonhong.bsky.social

🚀 How does 𝗠𝗼𝗜𝗖𝗟 improve In-Context Learning? 𝗠𝗼𝗜𝗖𝗟 prompts an LLM with multiple demonstration subsets, obtaining multiple experts, and merges their predictions via a trainable weighting function—it doesn’t require any fine-tuning of the LLM parameters! (🧵2/n)

November 18, 2024 at 6:37 PM

Giwon Hong

@giwonhong.bsky.social

I would love to be added as well!

November 17, 2024 at 8:18 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news