»Differentially Private Steering for Large Language Model Alignment« by @anmolgoel.bsky.social, Yaxi Hu, Iryna Gurevych (@igurevych.bsky.social) & Amartya Sanyal (@amartyasanyal.bsky.social)
(2/🧵)
»Differentially Private Steering for Large Language Model Alignment« by @anmolgoel.bsky.social, Yaxi Hu, Iryna Gurevych (@igurevych.bsky.social) & Amartya Sanyal (@amartyasanyal.bsky.social)
(2/🧵)