zaidkhan.me
Paper: arxiv.org/abs/2504.09763
Project Page: zaidkhan.me/EFAGen
Datasets + Models: huggingface.co/collections/...
HF Paper: huggingface.co/papers/2504....
Paper: arxiv.org/abs/2504.09763
Project Page: zaidkhan.me/EFAGen
Datasets + Models: huggingface.co/collections/...
HF Paper: huggingface.co/papers/2504....
We demonstrate this by inferring EFAs on the NuminaMath dataset, which includes problems ranging from grade school to olympiad level problems. EFAGen can successfully infer EFAs for all math sources in NuminaMath, even olympiad-level problems.
We demonstrate this by inferring EFAs on the NuminaMath dataset, which includes problems ranging from grade school to olympiad level problems. EFAGen can successfully infer EFAs for all math sources in NuminaMath, even olympiad-level problems.
Getting high-quality math data is expensive. EFAGen offers a way to improve upon existing math training data by generating problem variants through EFAs. EFA-based augmentation leads to consistent improvements across all evaluation metrics.
Getting high-quality math data is expensive. EFAGen offers a way to improve upon existing math training data by generating problem variants through EFAs. EFA-based augmentation leads to consistent improvements across all evaluation metrics.
We self-train Llama-3.1-8B-Instruct with rejection finetuning using our derived unit tests as a verifiable reward signal and see substantial improvements in the model’s ability to infer EFAs, especially on harder problems.
We self-train Llama-3.1-8B-Instruct with rejection finetuning using our derived unit tests as a verifiable reward signal and see substantial improvements in the model’s ability to infer EFAs, especially on harder problems.
➡️ EFAGen can infer EFAs for diverse + difficult math problems
➡️ Use EFAs to find + generate harder variants of existing math problems
➡️ LLMs can self-improve at writing EFAs
➡️ EFAGen can infer EFAs for diverse + difficult math problems
➡️ Use EFAs to find + generate harder variants of existing math problems
➡️ LLMs can self-improve at writing EFAs
-- improving generation faithfulness via multi-agent collaboration
(PS. Also a big thanks to ACs+reviewers for their effort!)
-- improving generation faithfulness via multi-agent collaboration
(PS. Also a big thanks to ACs+reviewers for their effort!)
-- generative infinite games
-- procedural+predictive video repres learning
-- bootstrapping VLN via self-refining data flywheel
-- automated preference data synthesis
-- diagnosing cultural bias of VLMs
-- adaptive decoding to balance contextual+parametric knowl conflicts
🧵
-- generative infinite games
-- procedural+predictive video repres learning
-- bootstrapping VLN via self-refining data flywheel
-- automated preference data synthesis
-- diagnosing cultural bias of VLMs
-- adaptive decoding to balance contextual+parametric knowl conflicts
🧵
-- balancing fast+slow sys-1.x planning
-- balancing agents' persuasion resistance+acceptance
-- multimodal compositional+modular video reasoning
-- reverse thinking for stronger LLM reasoning
-- lifelong multimodal instruc tuning via dyn data selec
🧵
-- balancing fast+slow sys-1.x planning
-- balancing agents' persuasion resistance+acceptance
-- multimodal compositional+modular video reasoning
-- reverse thinking for stronger LLM reasoning
-- lifelong multimodal instruc tuning via dyn data selec
🧵