Rich Harang
@rich.harang.org
Using bad guys to catch math since 2010.
Principal Security Architect (AI/ML) and AI Red Team at NVIDIA.
He/him. Personal account etc; `from std_disclaimers import *`
Safe AI starts with Secure AI.
Principal Security Architect (AI/ML) and AI Red Team at NVIDIA.
He/him. Personal account etc; `from std_disclaimers import *`
Safe AI starts with Secure AI.
Meanwhile, on Twitter (not "X"; their words not mine)....
(From quick inspection: mostly crypto + telegram scams -- this is about a week's worth)
(From quick inspection: mostly crypto + telegram scams -- this is about a week's worth)
May 27, 2025 at 12:57 PM
Meanwhile, on Twitter (not "X"; their words not mine)....
(From quick inspection: mostly crypto + telegram scams -- this is about a week's worth)
(From quick inspection: mostly crypto + telegram scams -- this is about a week's worth)
Tapping the "Models give you what you ask for, not what you want" sign yet again.
April 30, 2025 at 4:50 PM
Tapping the "Models give you what you ask for, not what you want" sign yet again.
I am begging AI Red Teams to stop killing themselves trying to prevent attacks that can be just as easily accomplished by editing client-side HTML.
For example:
For example:
February 26, 2025 at 2:10 PM
I am begging AI Red Teams to stop killing themselves trying to prevent attacks that can be just as easily accomplished by editing client-side HTML.
For example:
For example:
Without downloading new pictures/videos where are you mentally?
January 23, 2025 at 2:05 PM
Without downloading new pictures/videos where are you mentally?
An arcane tome filled with occult knowledge about the true workings of the world, that causes madness and despair in all who pursue its dark secrets? Yeah we've got one in the back.
January 8, 2025 at 6:02 PM
An arcane tome filled with occult knowledge about the true workings of the world, that causes madness and despair in all who pursue its dark secrets? Yeah we've got one in the back.
Apropos of the "models pretending to escape from their server" thing:
(from transformer-circuits.pub/2024/scaling...)
(from transformer-circuits.pub/2024/scaling...)
December 7, 2024 at 12:48 PM
Apropos of the "models pretending to escape from their server" thing:
(from transformer-circuits.pub/2024/scaling...)
(from transformer-circuits.pub/2024/scaling...)
Never change, "Answer with AI" features.
November 11, 2024 at 3:22 PM
Never change, "Answer with AI" features.
Never change, Microsoft. You're doing great.
November 9, 2024 at 3:04 PM
Never change, Microsoft. You're doing great.
The team's response to _every_ LLM security finding this week:
October 29, 2024 at 5:56 PM
The team's response to _every_ LLM security finding this week:
The good Confluence (at Harper's Ferry).
October 21, 2024 at 12:04 PM
The good Confluence (at Harper's Ferry).
Bucket list item complete: finally saw the northern lights in person. Had about ten minutes when you could see the green ribbons with the naked eye, no long exposure photo needed.
November 27, 2023 at 10:42 PM
Bucket list item complete: finally saw the northern lights in person. Had about ten minutes when you could see the green ribbons with the naked eye, no long exposure photo needed.
So remember the "mango pudding" LLM backdooring attack? How safe do you feel using these models now?
July 3, 2023 at 1:40 PM
So remember the "mango pudding" LLM backdooring attack? How safe do you feel using these models now?
PS I had to see this so now you do
May 3, 2023 at 5:07 PM
PS I had to see this so now you do
May 2, 2023 at 11:10 PM
permanently linked in my brain to
April 28, 2023 at 4:48 PM
permanently linked in my brain to