haokunliu.com
Story CoT:
github.com/ChicagoHAI/s... github.com/ChicagoHAI/s...
Anomalous Belief Shifts:
github.com/ChicagoHAI/a...
github.com/ChicagoHAI/a...
Coverage vs Efficiency:
github.com/ChicagoHAI/l...
github.com/ChicagoHAI/l...
Story CoT:
github.com/ChicagoHAI/s... github.com/ChicagoHAI/s...
Anomalous Belief Shifts:
github.com/ChicagoHAI/a...
github.com/ChicagoHAI/a...
Coverage vs Efficiency:
github.com/ChicagoHAI/l...
github.com/ChicagoHAI/l...
Coverage vs Efficiency: Using coding challenge benchmarks, both agents reached the conclusion that weaker models benefit from coverage, stronger models nail it first try.
Coverage vs Efficiency: Using coding challenge benchmarks, both agents reached the conclusion that weaker models benefit from coverage, stronger models nail it first try.
Anomalous Belief Shifts: Baselines define "anomalous": detected 77% sycophancy vs 85% factual consistency
Anomalous Belief Shifts: Baselines define "anomalous": detected 77% sycophancy vs 85% factual consistency
Story CoT: Floor effects matter: test at intermediate difficulty, not when everything struggles
Story CoT: Floor effects matter: test at intermediate difficulty, not when everything struggles
* Upgraded idea-explorer with resource finder—now pulls relevant papers, datasets & code
* Experiments use real datasets & existing code (less BS, more trust!)
Still Missing:
* Asking the right research questions
* Careful ablation studies
* Knowing when to seek external sources
* Upgraded idea-explorer with resource finder—now pulls relevant papers, datasets & code
* Experiments use real datasets & existing code (less BS, more trust!)
Still Missing:
* Asking the right research questions
* Careful ablation studies
* Knowing when to seek external sources
Story CoT: Narrative-Based Chain-of-Thought Reasoning
Anomalous Belief Shifts: Detecting Inappropriate Belief Changes
Coverage vs Efficiency: What Do LLMs Actually Improve?
Story CoT: Narrative-Based Chain-of-Thought Reasoning
Anomalous Belief Shifts: Detecting Inappropriate Belief Changes
Coverage vs Efficiency: What Do LLMs Actually Improve?
- github.com/ChicagoHAI/l...
- github.com/ChicagoHAI/l...
- github.com/ChicagoHAI/i...
- github.com/ChicagoHAI/i...
- github.com/ChicagoHAI/l...
- github.com/ChicagoHAI/l...
- github.com/ChicagoHAI/l...
- github.com/ChicagoHAI/l...
- github.com/ChicagoHAI/i...
- github.com/ChicagoHAI/i...
- github.com/ChicagoHAI/l...
- github.com/ChicagoHAI/l...
Full blog with technical details:
hypogenic.ai/blog/weekly-...
Substack: open.substack.com/pub/cichicag...
Full blog with technical details:
hypogenic.ai/blog/weekly-...
Substack: open.substack.com/pub/cichicag...
Because agents clearly can accelerate early-stage exploration. But they need human oversight at every step. Transparent benchmarking beats cherry-picked demos. Community feedback improves agents faster. And honestly, we're all figuring this out together.
Because agents clearly can accelerate early-stage exploration. But they need human oversight at every step. Transparent benchmarking beats cherry-picked demos. Community feedback improves agents faster. And honestly, we're all figuring this out together.
That's what I call the "meta intelligence" gap.
That's what I call the "meta intelligence" gap.
Some agents run faked human data, used undersized models even though compute was available, or calling simple answer reweighting as "multi-agent interactions". Resource collection and allocation is a bottleneck, but more importantly, the agents do not know when to search or seek help.
Some agents run faked human data, used undersized models even though compute was available, or calling simple answer reweighting as "multi-agent interactions". Resource collection and allocation is a bottleneck, but more importantly, the agents do not know when to search or seek help.
Agents can actually design and run small experiments: sometimes to seed bigger studies, sometimes as sanity checks, and sometimes to straight-up refute the original hypothesis. That kind of evidence is way more useful than “LLM-as-a-judge says the idea is good.”
Agents can actually design and run small experiments: sometimes to seed bigger studies, sometimes as sanity checks, and sometimes to straight-up refute the original hypothesis. That kind of evidence is way more useful than “LLM-as-a-judge says the idea is good.”
Do LLMs have different types of beliefs?
Can formal rules make AI agents honest about their uncertainty?
Can LLMs temporarily ignore their training to follow new rules?
Do LLMs have different types of beliefs?
Can formal rules make AI agents honest about their uncertainty?
Can LLMs temporarily ignore their training to follow new rules?
→ Submit your research idea or upvote existing ones (tag: "Weekly Competition")
→ Each Monday we select top 3 from previous week
→ We run experiments using research agents
→ Share repos + findings back on IdeaHub
Vote here: hypogenic.ai/ideahub
→ Submit your research idea or upvote existing ones (tag: "Weekly Competition")
→ Each Monday we select top 3 from previous week
→ We run experiments using research agents
→ Share repos + findings back on IdeaHub
Vote here: hypogenic.ai/ideahub