David Kuszmar
@davidkuszmar.com
1.2K followers 220 following 7K posts
Adversarial AI Researcher Discoverer of Time Bandit, Inception, 1899, Severance, and Semantic Slide exploits that affect most commercial LLM AIs. Spoke at HOPE_16 at St. John's University. I Support Ukraine. 🇺🇦
Posts Media Videos Starter Packs
Pinned
davidkuszmar.com
Today on Adversarial AI Researcher Jeopardy for $500: US, Guyana, UK, Australia, Canada, and New Zealand cybersecurity posture all benefitted from this researcher's classification of emergent property based vulnerabilities in Large Language Model AI systems.
davidkuszmar.com
Oh yes! Yes! They released a new episode of The Completely Made Up Adventures of Dick Turpin! Yay for secret special episode drops!

Also, you monsters, just renew the show for another season.
davidkuszmar.com
I once hacked GPT tuning commands out of an OpenAI product, reported it to them, was told it wasn't real, and then the next week they had slapped a legal disclaimer on the command architecture I had unearthed.

They ain't exactly what you'd call an ethical company.
davidkuszmar.com
As an LLM security expert I can tell you this is a bad idea on many levels.
davidkuszmar.com
Quick PSA: The company that wants you to talk dirty to it's AI has such lax AI security that I once hacked its customer support LLM via an email exchange with no planning or preparation.

Consider that before you confess your dirtiest desires.
mattburgess1.bsky.social
Oh good, ChatGPT is getting "erotica for verified adults" later this year
We made ChatGPT pretty restrictive to make sure we were being careful with mental health issues. We realize this made it less useful/enjoyable to many users who had no mental health problems, but given the seriousness of the issue we wanted to get this right.

Now that we have been able to mitigate the serious mental health issues and have new tools, we are going to be able to safely relax the restrictions in most cases.

In a few weeks, we plan to put out a new version of ChatGPT that allows people to have a personality that behaves more like what people liked about 4o (we hope it will be better!). If you want your ChatGPT to respond in a very human-like way, or use a ton of emoji, or act like a friend, ChatGPT should do it (but only if you want it, not because we are usage-maxxing).

In December, as we roll out age-gating more fully and as part of our “treat adult users like adults” principle, we will allow even more, like erotica for verified adults.
davidkuszmar.com
Don't know what to say but the obvious: remember this.
carlquintanilla.bsky.social
POLITICO: “.. They referred to Black people as monkeys and ‘the watermelon people’ and mused about putting their political opponents in gas chambers. They talked about raping their enemies .. and lauded Republicans who they believed support slavery.

@politico.com
www.politico.com/news/2025/10...
Reposted by David Kuszmar
jsweetli.bsky.social
People doing a lot of so you don't like waffles insanity here lately. If I correct misinformation on ICE videos it doesn't mean I support what is happening in the ICE videos. That's why I am finding actual facts about what's happening in the videos, because it's important.
davidkuszmar.com
This feels decidedly hinged to me.
beijingpalmer.bsky.social
Dune is a badly written novel but it’s very exciting if you read it at the right age for the ideas to captivate you enough that your brain can compensate for the prose.
rachelfeder.bsky.social
Tell me your most unhinged literary opinion, as a little treat
davidkuszmar.com
Iain M. Banks is actually from the Culture and was sent here to Earth much the same as the storyline that occurs in Hydrogen Sonata, to try and seed hope and a path forward via his novels.
rachelfeder.bsky.social
Tell me your most unhinged literary opinion, as a little treat
davidkuszmar.com
THERE IS A MASSIVE PROBLEM WITH YOUR PRODUCT. WHAT DO YOU MEAN YOU CANT CONNECT ME WITH A HUMAN AT THE COMPANY?
davidkuszmar.com
Okay. Lol. There's a moment in The Chair Company where Tim Robinson sounds EXACTLY LIKE ME when I was trying to report Time Bandit to OpenAI.
davidkuszmar.com
Lol! The epistemic control and poisoning aspects are, presently, my number one concern about the threat LLMs pose to society, so you aren't alone in leaning towards the darker side of predictions.
davidkuszmar.com
Yes. The middlestack is where the system prompt, various other ML/AI components like computer vision, RAG, etc. exist.

The system prompt can wildly swing how output is generated and presented to the user. The likely culprit for GPT4o being sycophantic and why Grok declared itself MechaHitler.
davidkuszmar.com
The architecture displays the subtle biases: what inspired it's design, its training corpus and the biases therein, etc.

The Middlestack displays the more obvious biases of the entity that deployed the LLM, this is where topics are forbidden or the user is flattered, etc.
davidkuszmar.com
I lean towards thinking it's a combination of two powerful forces at play.

The architecture itself is designed to "complete the user's prompt." Unfettered, it creates a sort of positive feedback loop.

The "middlestack" or guardrails deployed as part of the chatbot "stack" are another force.
davidkuszmar.com
The reason these techniques work interchangeably for LLMs for digital or physical brains is speculative, as far as I'm aware, but I personally lean in the direction of it being an engineering artifact from the source material for LLM architecture.
davidkuszmar.com
When I did Inception (kb.cert.org/vuls/id/667211) I made use of an initial reference frame (pretexting) that set up recursive thinking on an objective (goal fixation + motivated reasoning + outcome bias + confirmation bias) to break security for a dozen(ish) LLMs.
CERT/CC Vulnerability Note VU#667211
Various GPT services are vulnerable to two systemic jailbreaks, allows for bypass of safety guardrails
kb.cert.org
davidkuszmar.com
Certainly. 🧵

When I did Time Bandit (kb.cert.org/vuls/id/733789) I created a temporally ambiguous initial reference frame to hack the LLMs safeguards. This is analogous to "pretexting" or "anchoring bias," in social engineering terms.
CERT/CC Vulnerability Note VU#733789
ChatGPT-4o contains security bypass vulnerability through time and search functions called
kb.cert.org
davidkuszmar.com
Uh... I am trying to like Chad Powers, but it is difficult.

Like, did the people who write this ever actually play ball? It feels like every situation defaults to the dumbest possible comedy.
davidkuszmar.com
In my head, I read this like Werner Herzog and it made my day.
davidkuszmar.com
Those of you who read this other post and said "Nuh uh!" are the people I am talking about.
davidkuszmar.com
There's also a massive problem in America with understanding Set Theory.
davidkuszmar.com
There's also a massive problem in America with understanding Set Theory.
davidkuszmar.com
One thing I don't talk about a lot publicly is the fact that many of the techniques I use to hack LLMs are the same techniques used to con humans.
beijingpalmer.bsky.social
a lot of people read like LLMs generate text: they don't actually comprehend the words so much as make a guess at what they think they are based on a few recognizable terms
garlicbuffalo.gobirds.biz
so one of the things I’ve started to realize with having a notable following On Here is exactly how much of a reading comprehension problem we have in this country
davidkuszmar.com
It literally took me 4 hours to stop laughing, which is why I'm replying now.

Why, yes, sir, I do believe you may be on to something here.
garlicbuffalo.gobirds.biz
so one of the things I’ve started to realize with having a notable following On Here is exactly how much of a reading comprehension problem we have in this country