Lightnews — Scholar-powered news

Randall Bennett

@randallb.com

I don't love production engineering, but i've worked with some really great production engineers (at FB mostly) and I think I'd rather hire my LLM / coding agent to do a 50-80% job of that work so I can focus on creativity and product dev.

January 20, 2026 at 7:02 AM

Randall Bennett

@randallb.com

Obviously things like Nix, or other repeatability-first tools are in a similar boat, but with things like DNS credentials, or provisioning boxes, having an LLM be able to do that means that when you hit corners of your stack, an LLM might be able to get you some milage.

January 20, 2026 at 7:02 AM

Randall Bennett

@randallb.com

Walkthrough video: youtu.be/J_hQ2L_yy60

Gambit demo video - Launch edition

If you’re not familiar, agent harnesses are sort of like an operating system for an agent… they handle tool calling, planning, context window management, and...

www.youtube.com

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

We’ll be around if ya’ll have any questions or thoughts. Thanks for checking us out!

github.com/bolt-foundr...

GitHub - bolt-foundry/gambit: Agent harness framework for building, running, and verifying LLM workflows

Agent harness framework for building, running, and verifying LLM workflows - bolt-foundry/gambit

github.com

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

- Rubric based grading to guarantee you (for instance) don’t leak PII accidentally - Spin up a usable bot in minutes and have Codex or Claude Code use our command line runner / graders to build a first version that is pretty good w/ very little human intervention.

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

We’re really happy with how it’s working with some of our early design partners, and we think it’s a way to implement a lot of interesting applications:

- Truly open source agents and assistants, where logic, code, and prompts can be easily shared with the community.

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

We know it’s missing some obvious parts, but we wanted to get this out there to see how it could help people or start conversations.

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

Prior to Gambit, we had built an LLM based video editor, and we weren’t happy with the results, which is what brought us down this path of improving inference time LLM quality.

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

We also have test agents you can define on a deck-by-deck basis, that are designed to mimic scenarios your agent would face and generate synthetic data for either humans or graders to grade.

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

Additionally, each step of the chain gets automatic evals, we call graders. A grader is another deck type… but it’s designed to evaluate and score conversations (or individual conversation turns).

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

Agents can call agents, and each agent can be designed with whatever model params make sense for your task.

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

Essentially you describe each agent in either a self contained markdown file, or as a typescript program. Your root agent can bring in other agents as needed, and we create a typesafe way for you to define the interfaces between those agents. We call these decks.

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

Normally you might see an agent orchestration framework pipeline like:

compute -> compute -> compute -> LLM -> compute -> compute -> LLM

we invert this so with an agent harness, it’s more like:

LLM -> LLM -> LLM -> compute -> LLM -> LLM -> compute -> LLM

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

If you’re not familiar, agent harnesses are sort of like an operating system for an agent... they handle tool calling, planning, context window management, and don’t require as much developer orchestration.

January 16, 2026 at 12:26 AM

Randall Bennett

@randallb.com

they also align with user numbers too.

1 - 0 (are the only user or very few)
2 - 1-10 (can talk to every user)
3 - 10-100 (could talk to most users)
4 - 100-500 (could flag important users)
5 - 500+ (have to use analytics to understand most users.)

December 23, 2025 at 7:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news