Chandler Smith
chansmi.bsky.social
Chandler Smith
@chansmi.bsky.social
Multi-Agent Researcher at CAIF | applied research at IQT | Thinking about making MA systems go well
We see very strong performance across MATH, GSM8k, and CommonsenseQA against trained and untrained baselines with Llama 3.1 8B!
December 6, 2024 at 10:38 PM
By just looking at these trees, how do you tell which branches are useful for post-training without human feedback or trained PRMs? Value iteration can be used as a simple approach to propagate labels throughout branches with a thresholding factor to label the quality of reasoning steps.
December 6, 2024 at 10:38 PM
Our goal was to develop techniques where a system of multiple models could be trained together. We use a generator, critic, and refinement setting that mimics how humans might interact with LLMs.
December 6, 2024 at 10:38 PM
🚀🚨 Excited to announce our work on Multi-Agent LLM Training!

MALT is a multi-agent configuration that leverages synthetic data generation and credit assignment strategies for post-training specialized models solving problems together
December 6, 2024 at 10:38 PM