Lightnews — Scholar-powered news

Key findings from ideas (3/3):

Coverage vs Efficiency: Using coding challenge benchmarks, both agents reached the conclusion that weaker models benefit from coverage, stronger models nail it first try.

November 25, 2025 at 8:10 PM

Haokun Liu

@haokunliu.bsky.social

Key findings from ideas (2/3):

Anomalous Belief Shifts: Baselines define "anomalous": detected 77% sycophancy vs 85% factual consistency

November 25, 2025 at 8:09 PM

Haokun Liu

@haokunliu.bsky.social

Key findings from ideas (1/3):

Story CoT: Floor effects matter: test at intermediate difficulty, not when everything struggles

November 25, 2025 at 8:09 PM

Haokun Liu

@haokunliu.bsky.social

Agent Improvements:
* Upgraded idea-explorer with resource finder—now pulls relevant papers, datasets & code
* Experiments use real datasets & existing code (less BS, more trust!)

Still Missing:
* Asking the right research questions
* Careful ablation studies
* Knowing when to seek external sources

November 25, 2025 at 8:08 PM

Haokun Liu

@haokunliu.bsky.social

Three winning ideas this week:
Story CoT: Narrative-Based Chain-of-Thought Reasoning

Anomalous Belief Shifts: Detecting Inappropriate Belief Changes

Coverage vs Efficiency: What Do LLMs Actually Improve?

November 25, 2025 at 8:08 PM

Haokun Liu

@haokunliu.bsky.social

Big thanks to @chicagohai.bsky.social team and everyone who submitted ideas on IdeaHub. Special shoutout to the open source community building research agents! We're all learning together.

November 10, 2025 at 10:46 PM

Haokun Liu

@haokunliu.bsky.social

All 6 generated repositories with detailed code and reports:
- github.com/ChicagoHAI/l...
- github.com/ChicagoHAI/l...
- github.com/ChicagoHAI/i...
- github.com/ChicagoHAI/i...
- github.com/ChicagoHAI/l...
- github.com/ChicagoHAI/l...

November 10, 2025 at 10:45 PM

Haokun Liu

@haokunliu.bsky.social

Submit your idea, vote on existing ones, or help improve idea-explorer: github.com/ChicagoHAI/i...

Full blog with technical details:
hypogenic.ai/blog/weekly-...
Substack: open.substack.com/pub/cichicag...

Hypogenic AI - Shaping the Future of Science

Reimagining science by augmenting scientist-AI collaboration.

hypogenic.ai

November 10, 2025 at 10:45 PM

Haokun Liu

@haokunliu.bsky.social

So why are we doing this openly?

Because agents clearly can accelerate early-stage exploration. But they need human oversight at every step. Transparent benchmarking beats cherry-picked demos. Community feedback improves agents faster. And honestly, we're all figuring this out together.

November 10, 2025 at 10:44 PM

Haokun Liu

@haokunliu.bsky.social

Existing agents like AI-Scientist and AI-Researcher are basically overfitted to ML. There are hard-coded prompts that “requires changing hyperparameters and train on HuggingFace datasets” or specific ML agents. Just changing prompt won’t be enough, as ML assumptions are everywhere in the codebase.

November 10, 2025 at 10:44 PM

Haokun Liu

@haokunliu.bsky.social

The pattern: we can fix specific bugs with better prompts (bias-variance tradeoff). But we can't prompt our way to knowing when to search, recognizing expertise boundaries, or understanding what rigorous methodology looks like.

That's what I call the "meta intelligence" gap.

November 10, 2025 at 10:43 PM

Haokun Liu

@haokunliu.bsky.social

What didn’t

Some agents run faked human data, used undersized models even though compute was available, or calling simple answer reweighting as "multi-agent interactions". Resource collection and allocation is a bottleneck, but more importantly, the agents do not know when to search or seek help.

November 10, 2025 at 10:43 PM

Haokun Liu

@haokunliu.bsky.social

What worked

Agents can actually design and run small experiments: sometimes to seed bigger studies, sometimes as sanity checks, and sometimes to straight-up refute the original hypothesis. That kind of evidence is way more useful than “LLM-as-a-judge says the idea is good.”

November 10, 2025 at 10:40 PM

Haokun Liu

@haokunliu.bsky.social

There's a lot of hype on AI agents for science. But what can they actually do? We tested our idea-explorer on ideas from IdeaHub:

Do LLMs have different types of beliefs?
Can formal rules make AI agents honest about their uncertainty?
Can LLMs temporarily ignore their training to follow new rules?

November 10, 2025 at 10:35 PM

Haokun Liu

@haokunliu.bsky.social

Here's how it works:
→ Submit your research idea or upvote existing ones (tag: "Weekly Competition")
→ Each Monday we select top 3 from previous week
→ We run experiments using research agents
→ Share repos + findings back on IdeaHub

Vote here: hypogenic.ai/ideahub

Hypogenic AI - Shaping the Future of Science

Reimagining science by augmenting scientist-AI collaboration.

hypogenic.ai

November 10, 2025 at 9:33 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news