Lightnews — Scholar-powered news

Light up
your news

Create account Sign in

About Privacy Terms Help

Zihao Zhao

Zihao Zhao

@zihaozhao.bsky.social

14 followers 11 following 6 posts

PhD student @jhuclsp.bsky.social| AI safety & privacy
Previous: Undergrad @jhucompsci.bsky.social

Posts Replies Media Videos

Zihao Zhao

@zihaozhao.bsky.social

Thank you to @anjalief.bsky.social for advising. Hands-on with DP-SGD? Start with our another paper and open-source package
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)

SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains

We present SynthTextEval, a toolkit for conducting comprehensive evaluations of synthetic text. The fluency of large language model (LLM) outputs has made synthetic text potentially viable for numerou...

October 15, 2025 at 8:24 PM

Zihao Zhao

@zihaozhao.bsky.social

🔗 Paper & code
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM

Controlled Generation for Private Synthetic Text

Text anonymization is essential for responsibly developing and deploying AI in high-stakes domains such as healthcare, social services, and law. In this work, we propose a novel methodology for privac...

October 15, 2025 at 8:24 PM

Zihao Zhao

@zihaozhao.bsky.social

4/5 📈 Utility
On TAB, prefix-tuning+masking gives best utility (Perplexity ≈ 10.2, MAUVE ≈ 0.83), beating ICL and DP-SGD. Similar trends on MIMIC-III.

October 15, 2025 at 8:24 PM

Zihao Zhao

@zihaozhao.bsky.social

3/5🔒 Privacy
ICL+blocking: ~0.00% privacy leakage (avg in our runs).
Prefix-tuning+masking yields the lowest ROUGE vs training data (e.g., ROUGE-L ≈ 0.098), indicating less copying.

October 15, 2025 at 8:24 PM

Zihao Zhao

@zihaozhao.bsky.social

2/5 🔧 How it works
• Build control codes from detected private entities (PERSON, ORG, LOC, etc.).
• Generate with either ICL (and block those identifiers at decode time) or prefix-tuning with a privacy mask + KL/contrastive losses.

October 15, 2025 at 8:24 PM