The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
The same 99% will happen here too, but if AI researchers continue to get perma-banned for making available the datasets needed to filter it, it’s going to make this platform unusable.
By adding a few lines of code to the base Llama 3 tokenizer, he got a free boost in arithmetic performance 😮
[thread]
By adding a few lines of code to the base Llama 3 tokenizer, he got a free boost in arithmetic performance 😮
[thread]
🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games!
BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come.
1/🧵
🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games!
BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come.
1/🧵