shane caldwell
shanecaldwell.bsky.social
shane caldwell
@shanecaldwell.bsky.social
synthetic data, RL, hackbots - writing at https://hackbot.dad/
If you don’t have homemade priors, store bought is fine.
We fine-tuned an 8B model to pop a GOAD domain…using only synthetic training data. No real networks. No frontier model distillation. Just a world model that simulates AD environments and generates realistic pentesting trajectories.

See how we did it: dreadnode.io/blog/worlds-...
Worlds: A Simulation Engine for Agentic Pentesting
An 8B model went from blindly loading Metasploit modules to achieving Domain Admin on GOAD, trained entirely on synthetic data from our world model system.
dreadnode.io
February 11, 2026 at 5:01 PM
new blog: RL Needed LLMs Because Agency Requires Priors

Mostly a retrospective on how I mourned RL after AlphaZero and how much better it feels that it's back.

If you weren't working with DQNs it's hard to appreciate just how well things work with LLMs.

hackbot.dad/writing/rl-l...
August 25, 2025 at 2:16 PM
GPT-5 had a lot of mixed reactions over the last week or so and I wanted to talk about:

- Chart crime
- Why I barely read the model card anymore
- Why public benchmarks aren't very relevant to you, and you should invest the time in building something custom

hackbot.dad/writing/agon...
GPT-5 is Good, Actually: The Agony and Ecstasy of Public Benchmarks
An attempt to explain why benchmarks are either bad or secret, and why the bar charts don't matter so much.
hackbot.dad
August 18, 2025 at 12:07 AM
Reposted by shane caldwell
Incoming: Dreadnode paper drop from Shane Caldwell and the crew.

PentestJudge—Judging Agent Behavior Against Operational Requirements: arxiv.org/abs/2508.02921

Explore how we built an LLM-as-judge system for evaluating the operations of pentesting agents (inspired by PaperBench).
August 6, 2025 at 6:31 PM