LLMs changed everything. Now every company has AI on its agenda. The market exploded.
But a dangerous gap is forming between C-level expectations and actual delivery capability.
LLMs changed everything. Now every company has AI on its agenda. The market exploded.
But a dangerous gap is forming between C-level expectations and actual delivery capability.
Maybe AI is the same: don't tell it what's perfect. Just help it avoid what's wrong.
📄 arxiv.org/pdf/2506.01347
Maybe AI is the same: don't tell it what's perfect. Just help it avoid what's wrong.
📄 arxiv.org/pdf/2506.01347
It's essential for long-term performance.
There's something almost Winnicott-ian here...
It's essential for long-term performance.
There's something almost Winnicott-ian here...
They tell you what to avoid rather than what to obsess over.
The deeper principle is that entropy is a resource. Once collapsed, it's nearly impossible to recover.
They tell you what to avoid rather than what to obsess over.
The deeper principle is that entropy is a resource. Once collapsed, it's nearly impossible to recover.
We've battled this tradeoff for years: exploitation vs. exploration.
Too much positive reinforcement creates filter bubbles, trapping users in increasingly narrow content loops.
We've battled this tradeoff for years: exploitation vs. exploration.
Too much positive reinforcement creates filter bubbles, trapping users in increasingly narrow content loops.
Rewarding correct answers creates overconfidence. The model locks onto specific paths and loses its ability to explore alternatives.
Rewarding correct answers creates overconfidence. The model locks onto specific paths and loses its ability to explore alternatives.
At higher attempts? It actually outperforms everything.
The hypothesis is: punishment preserves diversity.
At higher attempts? It actually outperforms everything.
The hypothesis is: punishment preserves diversity.