Lightnews — Scholar-powered news

David Kuszmar

@davidkuszmar.com

1.2K followers 220 following 7K posts

Adversarial AI Researcher Discoverer of Time Bandit, Inception, 1899, Severance, and Semantic Slide exploits that affect most commercial LLM AIs. Spoke at HOPE_16 at St. John's University. I Support Ukraine. 🇺🇦

Posts Media Videos Starter Packs

Pinned

David Kuszmar @davidkuszmar.com · 7d

Today on Adversarial AI Researcher Jeopardy for $500: US, Guyana, UK, Australia, Canada, and New Zealand cybersecurity posture all benefitted from this researcher's classification of emergent property based vulnerabilities in Large Language Model AI systems.

David Kuszmar @davidkuszmar.com · 11h

Oh yes! Yes! They released a new episode of The Completely Made Up Adventures of Dick Turpin! Yay for secret special episode drops!

Also, you monsters, just renew the show for another season.

David Kuszmar @davidkuszmar.com · 11h

I once hacked GPT tuning commands out of an OpenAI product, reported it to them, was told it wasn't real, and then the next week they had slapped a legal disclaimer on the command architecture I had unearthed.

They ain't exactly what you'd call an ethical company.

1 1 12

David Kuszmar @davidkuszmar.com · 14h

As an LLM security expert I can tell you this is a bad idea on many levels.

The Verge @theverge.com · 19h

Sam Altman says ChatGPT will soon sext with verified adults

ChatGPT is about to get flirty.

buff.ly

2 5

David Kuszmar @davidkuszmar.com · 14h

Quick PSA: The company that wants you to talk dirty to it's AI has such lax AI security that I once hacked its customer support LLM via an email exchange with no planning or preparation.

Consider that before you confess your dirtiest desires.

Matt Burgess (WIRED) @mattburgess1.bsky.social · 19h

Oh good, ChatGPT is getting "erotica for verified adults" later this year

We made ChatGPT pretty restrictive to make sure we were being careful with mental health issues. We realize this made it less useful/enjoyable to many users who had no mental health problems, but given the seriousness of the issue we wanted to get this right.

Now that we have been able to mitigate the serious mental health issues and have new tools, we are going to be able to safely relax the restrictions in most cases.

In a few weeks, we plan to put out a new version of ChatGPT that allows people to have a personality that behaves more like what people liked about 4o (we hope it will be better!). If you want your ChatGPT to respond in a very human-like way, or use a ton of emoji, or act like a friend, ChatGPT should do it (but only if you want it, not because we are usage-maxxing).

In December, as we roll out age-gating more fully and as part of our “treat adult users like adults” principle, we will allow even more, like erotica for verified adults.

David Kuszmar @davidkuszmar.com · 18h

Don't know what to say but the obvious: remember this.

Carl Quintanilla @carlquintanilla.bsky.social · 19h

POLITICO: “.. They referred to Black people as monkeys and ‘the watermelon people’ and mused about putting their political opponents in gas chambers. They talked about raping their enemies .. and lauded Republicans who they believed support slavery.

@politico.com
www.politico.com/news/2025/10...

Reposted by David Kuszmar

Jacqueline Sweet @jsweetli.bsky.social · 19h

People doing a lot of so you don't like waffles insanity here lately. If I correct misinformation on ICE videos it doesn't mean I support what is happening in the ICE videos. That's why I am finding actual facts about what's happening in the videos, because it's important.

8 9 140

David Kuszmar @davidkuszmar.com · 1d

a man with a mustache wearing a cowboy hat and vest is sitting in a bowling alley .

Alt: a man with a mustache wearing a cowboy hat and vest is sitting in a bowling alley .

media.tenor.com

David Kuszmar @davidkuszmar.com · 1d

This feels decidedly hinged to me.

BeijingPalmer @beijingpalmer.bsky.social · 1d

Dune is a badly written novel but it’s very exciting if you read it at the right age for the ideas to captivate you enough that your brain can compensate for the prose.

Rachel Feder @rachelfeder.bsky.social · 2d

Tell me your most unhinged literary opinion, as a little treat

David Kuszmar @davidkuszmar.com · 1d

Iain M. Banks is actually from the Culture and was sent here to Earth much the same as the storyline that occurs in Hydrogen Sonata, to try and seed hope and a path forward via his novels.

Rachel Feder @rachelfeder.bsky.social · 2d

Tell me your most unhinged literary opinion, as a little treat

1 1

David Kuszmar @davidkuszmar.com · 1d

THERE IS A MASSIVE PROBLEM WITH YOUR PRODUCT. WHAT DO YOU MEAN YOU CANT CONNECT ME WITH A HUMAN AT THE COMPANY?

David Kuszmar @davidkuszmar.com · 1d

Okay. Lol. There's a moment in The Chair Company where Tim Robinson sounds EXACTLY LIKE ME when I was trying to report Time Bandit to OpenAI.

David Kuszmar @davidkuszmar.com · 1d

Lol! The epistemic control and poisoning aspects are, presently, my number one concern about the threat LLMs pose to society, so you aren't alone in leaning towards the darker side of predictions.

David Kuszmar @davidkuszmar.com · 1d

Yes. The middlestack is where the system prompt, various other ML/AI components like computer vision, RAG, etc. exist.

The system prompt can wildly swing how output is generated and presented to the user. The likely culprit for GPT4o being sycophantic and why Grok declared itself MechaHitler.

David Kuszmar @davidkuszmar.com · 1d

The architecture displays the subtle biases: what inspired it's design, its training corpus and the biases therein, etc.

The Middlestack displays the more obvious biases of the entity that deployed the LLM, this is where topics are forbidden or the user is flattered, etc.

David Kuszmar @davidkuszmar.com · 1d

I lean towards thinking it's a combination of two powerful forces at play.

The architecture itself is designed to "complete the user's prompt." Unfettered, it creates a sort of positive feedback loop.

The "middlestack" or guardrails deployed as part of the chatbot "stack" are another force.

David Kuszmar @davidkuszmar.com · 1d

The reason these techniques work interchangeably for LLMs for digital or physical brains is speculative, as far as I'm aware, but I personally lean in the direction of it being an engineering artifact from the source material for LLM architecture.

David Kuszmar @davidkuszmar.com · 1d

When I did Inception (kb.cert.org/vuls/id/667211) I made use of an initial reference frame (pretexting) that set up recursive thinking on an objective (goal fixation + motivated reasoning + outcome bias + confirmation bias) to break security for a dozen(ish) LLMs.

CERT/CC Vulnerability Note VU#667211

Various GPT services are vulnerable to two systemic jailbreaks, allows for bypass of safety guardrails

kb.cert.org

David Kuszmar @davidkuszmar.com · 1d

Certainly. 🧵

When I did Time Bandit (kb.cert.org/vuls/id/733789) I created a temporally ambiguous initial reference frame to hack the LLMs safeguards. This is analogous to "pretexting" or "anchoring bias," in social engineering terms.

CERT/CC Vulnerability Note VU#733789

ChatGPT-4o contains security bypass vulnerability through time and search functions called

kb.cert.org

David Kuszmar @davidkuszmar.com · 1d

Uh... I am trying to like Chad Powers, but it is difficult.

Like, did the people who write this ever actually play ball? It feels like every situation defaults to the dumbest possible comedy.

David Kuszmar @davidkuszmar.com · 1d

In my head, I read this like Werner Herzog and it made my day.

David Kuszmar @davidkuszmar.com · 1d

Srsly.

David Kuszmar @davidkuszmar.com · 1d

Those of you who read this other post and said "Nuh uh!" are the people I am talking about.

David Kuszmar @davidkuszmar.com · 1d

There's also a massive problem in America with understanding Set Theory.

David Kuszmar @davidkuszmar.com · 1d

There's also a massive problem in America with understanding Set Theory.

David Kuszmar @davidkuszmar.com · 1d

One thing I don't talk about a lot publicly is the fact that many of the techniques I use to hack LLMs are the same techniques used to con humans.

BeijingPalmer @beijingpalmer.bsky.social · 1d

a lot of people read like LLMs generate text: they don't actually comprehend the words so much as make a guess at what they think they are based on a few recognizable terms

pork, cheese, broccoli rabe @garlicbuffalo.gobirds.biz · 1d

so one of the things I’ve started to realize with having a notable following On Here is exactly how much of a reading comprehension problem we have in this country

David Kuszmar @davidkuszmar.com · 1d

It literally took me 4 hours to stop laughing, which is why I'm replying now.

Why, yes, sir, I do believe you may be on to something here.

pork, cheese, broccoli rabe @garlicbuffalo.gobirds.biz · 1d

so one of the things I’ve started to realize with having a notable following On Here is exactly how much of a reading comprehension problem we have in this country