rylanschaeffer.bsky.social
@rylanschaeffer.bsky.social
www.biorxiv.org/content/10.1... is the most exciting new advancement in #neuroscience I've seen in a while!

A brief 🧵: The shared dream of neuroscience and physiology has been to understand how behavior and homeostasis emerge from continuous loops of exchange between brain and body.

1/N
Imaging cellular activity simultaneously across all organs of a vertebrate reveals body-wide circuits
All cells in an animal collectively ensure, moment-to-moment, the survival of the whole organism in the face of environmental stressors[1][1],[2][2]. Physiology seeks to elucidate the intricate networ...
www.biorxiv.org
August 23, 2025 at 6:45 PM
What happens when "If at first you don't succeed, try again?" meets modern ML/AI insights about scaling up?

You jailbreak every model on the market😱😱😱

Fire work led by @jplhughes.bsky.social
Sara Price @aengusl.bsky.social Mrinank Sharma
Ethan Perez

arxiv.org/abs/2412.03556
December 13, 2024 at 4:51 PM
🚨🛡️Jailbreak Defense in a Narrow Domain 🛡️🚨

This paper disentangles whether jailbreak defenses struggle due to needing to defend against so many diverse threats OR the innate hardness of defending against a single threat

1/2
🚨🛡️ Jailbreak Defense in a Narrow Domain 🛡️🚨

Jailbreaking is easy. Defending is hard. Might defending against a single, narrow behavior be easier?

Even in this focused setting, all defenses fail 😱 arxiv.org/abs/2412.02159

Appearing at @AdvMLFrontiers (Oral) & @solarneurips #NeurIPS2024
Jailbreak Defense in a Narrow Domain: Limitations of Existing...
Defending large language models against jailbreaks so that they never engage in a broadly-defined set of forbidden behaviors is an open problem. In this paper, we investigate the difficulty of...
arxiv.org
December 6, 2024 at 5:09 PM
I'm disappointed I needed to say this on an #ICLR2025 submission:

"While the authors are well within their rights to challenge prior work, suppressing contradictory evidence is antithetical to the scientific process and muddies the research community's understanding"
December 3, 2024 at 4:18 AM