Pura Peetathawatchai
banner
poonpura.bsky.social
Pura Peetathawatchai
@poonpura.bsky.social
M.S. Computer Science @Stanford. Interested in machine learning privacy, AI security, diffusion models, cryptography, AI for environment, healthcare, education

🌱 poonpura.github.io
🙏🙏🙏
November 27, 2024 at 6:48 PM
🎓 I am also applying for PhD programs this Fall! If you think I am a good fit for your lab, please contact me at [email protected] 😄
November 27, 2024 at 6:43 PM
For details, check out our paper (feedback appreciated!):

📄: arxiv.org/abs/2411.14639
🙌: big thank you to my collaborators and mentors Wei-Ning Chen, @berivanisik.bsky.social, Sanmi Koyejo, Albert No
🧵 16/16
Differentially Private Adaptation of Diffusion Models via Noisy Aggregated Embeddings
We introduce novel methods for adapting diffusion models under differential privacy (DP) constraints, enabling privacy-preserving style and content transfer without fine-tuning. Traditional approaches...
arxiv.org
November 27, 2024 at 6:43 PM
We tried generating images using different values of subsample size (m) and DP parameter ε. Our results were particularly good for Textual Inversion (TI)!

🧵 15/16
November 27, 2024 at 6:43 PM
We tested the effectiveness of our approach on two different target datasets: a collection of artworks from an artist (with consent, see her art on Instagram: @eveismyname) and the Paris 2024 Olympic pictograms (approved for non-commercial editorial use, ©️IOC - 2023)

🧵 14/16
November 27, 2024 at 6:43 PM
By only aggregating over a smaller sample of the target embeddings, we can enhance the strength of our DP guarantees. This allows us to achieve the same privacy guarantees with much less noise, and hence much better image quality! ✨

🧵13/16
November 27, 2024 at 6:43 PM
For a bigger privacy-utility boost, we can also introduce subsampling. [1]

[1] arxiv.org/abs/2210.00597
🧵 12/16
November 27, 2024 at 6:43 PM
4. Apply noisy aggregated embedding to Style Guidance or Textual Inversion 🔥
5. Serve and enjoy! 🍴

For details, see our paper:
📄: arxiv.org/abs/2411.14639
🧵 11/16
November 27, 2024 at 6:43 PM
Our recipe can be summarized as follows: 🍳

1. Obtain an embedding vector for each image in the target dataset 🌿
2. Aggregate the embeddings to limit sensitivity to individual image 🥣
3. Add DP noise using the Gaussian mechanism 🧂

🧵 10/16
November 27, 2024 at 6:43 PM
2. Textual Inversion [1] (use the target dataset to train a new token embedding vector that is later used in the text prompt during image generation)

[1] arxiv.org/abs/2208.01618
🧵 9/16
November 27, 2024 at 6:43 PM
1. Universal Guidance’s CLIP style guidance [1] (guide image towards target CLIP embedding during image generation)

[1] arxiv.org/abs/2302.07121
🧵 8/16
November 27, 2024 at 6:43 PM
But here, we propose a new approach using embedding vectors.

Our work focuses on applying DP to known diffusion model adaptation approaches that involve encoding the target dataset into an embedding vector, including:

🧵7/16
November 27, 2024 at 6:43 PM
We therefore turn to other DP approaches that don’t require full training using DP-SGD. Some work has been done on this, such as DP-LoRA [1] (utilizing Low-Rank Adaptation) and DP-RDM [2] (utilizing Retrieval Augmented Generation).

[1] arxiv.org/abs/2110.06500
[2] arxiv.org/abs/2403.14421
🧵 6/16
November 27, 2024 at 6:43 PM
But while DP-SGD is powerful, it struggles with:
1. High computational costs
2. Incompatibility with batch normalization
3. Severe degradation in image quality

🧵5/16
November 27, 2024 at 6:43 PM
The first solution that comes to mind is differential privacy (DP), which adds noise to provide data privacy. DP-SGD [1] is particularly popular for neural networks, and work has been done to adapt DP-SGD to diffusion models.

[1] arxiv.org/abs/1607.00133
🧵 4/16
November 27, 2024 at 6:43 PM
This means the model might directly recreate training images instead of generalizing patterns. This poses copyright concerns for artists and privacy issues for sensitive datasets.©️

🧵 3/16
November 27, 2024 at 6:43 PM
Diffusion models like Stable Diffusion have revolutionized image generation and can be personalized on smaller datasets to capture specific objects or styles. But personalizing on small datasets risks memorization.

🧵 2/16
November 27, 2024 at 6:43 PM