Lightnews — Scholar-powered news

shane caldwell

@shanecaldwell.bsky.social

11 followers 25 following 4 posts

synthetic data, RL, hackbots - writing at https://hackbot.dad/

Posts Replies Media Videos

shane caldwell

@shanecaldwell.bsky.social

If you don’t have homemade priors, store bought is fine.

Dreadnode @dreadnode.bsky.social · 11h

We fine-tuned an 8B model to pop a GOAD domain…using only synthetic training data. No real networks. No frontier model distillation. Just a world model that simulates AD environments and generates realistic pentesting trajectories.

See how we did it: dreadnode.io/blog/worlds-...

Worlds: A Simulation Engine for Agentic Pentesting

An 8B model went from blindly loading Metasploit modules to achieving Domain Admin on GOAD, trained entirely on synthetic data from our world model system.

dreadnode.io

February 11, 2026 at 5:01 PM

shane caldwell

@shanecaldwell.bsky.social

new blog: RL Needed LLMs Because Agency Requires Priors

Mostly a retrospective on how I mourned RL after AlphaZero and how much better it feels that it's back.

If you weren't working with DQNs it's hard to appreciate just how well things work with LLMs.

hackbot.dad/writing/rl-l...

Screenshot of beginning of blog post: RL Needed LLMs Because Agency Requires Priors

August 25, 2025 at 2:16 PM

shane caldwell

@shanecaldwell.bsky.social

GPT-5 had a lot of mixed reactions over the last week or so and I wanted to talk about:

- Chart crime
- Why I barely read the model card anymore
- Why public benchmarks aren't very relevant to you, and you should invest the time in building something custom

hackbot.dad/writing/agon...

GPT-5 is Good, Actually: The Agony and Ecstasy of Public Benchmarks

An attempt to explain why benchmarks are either bad or secret, and why the bar charts don't matter so much.

hackbot.dad

August 18, 2025 at 12:07 AM

Reposted by shane caldwell

Dreadnode

@dreadnode.bsky.social

Incoming: Dreadnode paper drop from Shane Caldwell and the crew.

PentestJudge—Judging Agent Behavior Against Operational Requirements: arxiv.org/abs/2508.02921

Explore how we built an LLM-as-judge system for evaluating the operations of pentesting agents (inspired by PaperBench).

August 6, 2025 at 6:31 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news