ChatGPT can follow hand logic well, but sometimes drops real clunkers. A ReAct-style loop with solver calls mid-conversation could hang in the weeds of EV calcs.
ChatGPT can follow hand logic well, but sometimes drops real clunkers. A ReAct-style loop with solver calls mid-conversation could hang in the weeds of EV calcs.
That’s basically ReAct training data. Important work, but undervalued when treated as annotation.
That’s basically ReAct training data. Important work, but undervalued when treated as annotation.
The illusion cracks a little.
Yoav Farbey ( @yoavf.bsky.social ) has a hilarious repo on this theme:
github.com/yoavf/absolu...
The illusion cracks a little.
Yoav Farbey ( @yoavf.bsky.social ) has a hilarious repo on this theme:
github.com/yoavf/absolu...
The output space grows combinatorially, but because nonsense actions are harder to justify in language, effective entropy may shrink.
The output space grows combinatorially, but because nonsense actions are harder to justify in language, effective entropy may shrink.
For me, reasoning with vs. without verbalizing feels about the same — like the same neurons fire either way.
For me, reasoning with vs. without verbalizing feels about the same — like the same neurons fire either way.
Would love to hear what others think about ARPO — or if you’ve seen clever crossover ideas like using bioinformatics for GenAI training. 👀
Would love to hear what others think about ARPO — or if you’ve seen clever crossover ideas like using bioinformatics for GenAI training. 👀
ARPO’s “soft advantage attribution” compares rollouts at the token level, but once sequences diverge it gets brittle. Reminds me of sequence alignment in genomics — maybe Smith–Waterman could help here.
ARPO’s “soft advantage attribution” compares rollouts at the token level, but once sequences diverge it gets brittle. Reminds me of sequence alignment in genomics — maybe Smith–Waterman could help here.
They massively extend LLM capabilities, cut down on hallucinations, and give researchers a natural breakpoint mid-calculation. You can actually peek into the stream of tokens and see what’s happening.
They massively extend LLM capabilities, cut down on hallucinations, and give researchers a natural breakpoint mid-calculation. You can actually peek into the stream of tokens and see what’s happening.
What does it mean to be “agentic”? The line between an algorithm and an agent feels blurry. ARPO suggests it’s about adaptability: agents curate their own tools and reasoning paths.
What does it mean to be “agentic”? The line between an algorithm and an agent feels blurry. ARPO suggests it’s about adaptability: agents curate their own tools and reasoning paths.