Lightnews — Scholar-powered news

"forces" is doing a lot of work in that booklet

If that piece of the prompt falls out of context, for whatever reason, you lose tool_use specificity

This is why all of major providers have a "tool=[]" array as one of the parameters for the API call. Prompt injection tool_use only gets you so far

April 13, 2025 at 3:35 AM

GödelEscherSpock 🦿

@adam-hill.bsky.social

In fact, we think that a new metric in the SWE Bench should be Instruction Following,

You do not suck at prompting, some LLMs just like to ignore us occasionally.

April 10, 2025 at 4:50 PM

GödelEscherSpock 🦿

@adam-hill.bsky.social

Even if we constantly remind LLMs to do X is gives us a big - NOPE. :-)

One of the biggest offenders is Gemini - if we tell it to do 1 simple binary thing every prompt - "Do not write comments for code you write – EVER" it still will do it 7 times out of 10. But... still writes pretty good code

April 10, 2025 at 4:50 PM

GödelEscherSpock 🦿

@adam-hill.bsky.social

Nope, same experience on the OSS agentic programming plugin - Roo Code, we regularly swap in / out new models as they roll out daily and find the same thing.

Less hallucination (since we have full control of context) but forgetting instructions is a very big deal.

April 10, 2025 at 4:50 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news