Giwon Hong
giwonhong.bsky.social
Giwon Hong
@giwonhong.bsky.social
PhD student in ILCC (NLP) program at the University of Edinburgh
🔍 Conclusion: 𝗠𝗼𝗜𝗖𝗟 offers a robust, efficient approach for combining demonstrations (experts), significantly boosting accuracy over baselines. 𝗠𝗼𝗜𝗖𝗟 is also resilient to low-quality demonstrations and achieves improved data and computational efficiency. (🧵7/n)
November 18, 2024 at 6:38 PM
⚙️ Data and Compute Efficiency of 𝗠𝗼𝗜𝗖𝗟: We find that 𝗠𝗼𝗜𝗖𝗟 is more efficient in terms of data and computation compared to conventional (concat-based) ICL! (🧵6/n)
November 18, 2024 at 6:38 PM
📉 Noisy and Imbalanced Demonstrations: By assigning weights to each demonstration subset, 𝗠𝗼𝗜𝗖𝗟 can effectively handle various practical applications where data quality varies. (🧵5/n)
November 18, 2024 at 6:38 PM
🌐Generalization to Unseen Demonstrations: 𝙨𝙘𝙖𝙡𝙖𝙧 weights require predefined demonstration subsets.
Using 𝙃𝙮𝙥𝙚𝙧-𝙣𝙚𝙩𝙬𝙤𝙧𝙠—a smaller fine-tuned hyper-network that dynamically generates weights for each expert based on all concatenated demonstration subsets. (🧵4/n)
November 18, 2024 at 6:37 PM
📊 𝗠𝗼𝗜𝗖𝗟 in Classification Tasks: 𝗠𝗼𝗜𝗖𝗟 outperformed Baseline ICL on 5 out of 7 datasets!
Using 𝙨𝙘𝙖𝙡𝙖𝙧 weights—a vector of trainable parameters that assign each expert a weight—we fine-tuned how demonstration subsets are combined. (🧵3/n)
November 18, 2024 at 6:37 PM
🚀 How does 𝗠𝗼𝗜𝗖𝗟 improve In-Context Learning? 𝗠𝗼𝗜𝗖𝗟 prompts an LLM with multiple demonstration subsets, obtaining multiple experts, and merges their predictions via a trainable weighting function—it doesn’t require any fine-tuning of the LLM parameters! (🧵2/n)
November 18, 2024 at 6:37 PM
I would love to be added as well!
November 17, 2024 at 8:18 PM