Lightnews — Scholar-powered news

Krithika Ramesh

@stolenpyjak.bsky.social

SynthTextEval was developed in close collaboration with
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social

Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!

GitHub - kr-ramesh/synthtexteval: SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration)

SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval

github.com

November 7, 2025 at 12:53 AM

Krithika Ramesh

@stolenpyjak.bsky.social

SynthTextEval is a comprehensive toolkit for evaluating synthetic text data with a wide range of metrics, enabling standardized, comparable assessments of generation approaches and building greater confidence in the quality of synthetic data, especially for high-stakes domains

November 7, 2025 at 12:53 AM

Krithika Ramesh

@stolenpyjak.bsky.social

Synthetic data shouldn’t be a black box - we make it easier to examine and identify issues in synthetic data outputs with
- Interactive text exploration & review with our GUI tool
- Exploring text diversity, structure and themes with our visual and descriptive text analyses tools

November 7, 2025 at 12:53 AM

Krithika Ramesh

@stolenpyjak.bsky.social

SynthTextEval also supports fine-tuning models for controllable text generation across diverse domains, which allows users to
- Produce text tailored to user-defined styles, content types, or domain labels
- Generate synthetic data with differentially private guarantees

November 7, 2025 at 12:53 AM

Krithika Ramesh

@stolenpyjak.bsky.social

🔧Utility: Downstream task-based evaluations (classification, coreference resolution)
📊Fairness: distributional balance & representational biases
🔐Privacy: Leakage, memorization, and re-identification risk
📜Quality: Distributional differences between synthetic and real text

November 7, 2025 at 12:53 AM

Krithika Ramesh

@stolenpyjak.bsky.social

Conventional metrics like BLEU, ROUGE, or perplexity only scratch the surface of synthetic text quality!

Our framework introduces a multi-dimensional evaluation suite that covers aspects such as utility, privacy, fairness and distributional similarity to the real data.

November 7, 2025 at 12:53 AM

Reposted by Krithika Ramesh

Zihao Zhao

@zihaozhao.bsky.social

Thank you to @anjalief.bsky.social for advising. Hands-on with DP-SGD? Start with our another paper and open-source package
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)

SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains

We present SynthTextEval, a toolkit for conducting comprehensive evaluations of synthetic text. The fluency of large language model (LLM) outputs has made synthetic text potentially viable for numerou...

arxiv.org

October 15, 2025 at 8:24 PM

Reposted by Krithika Ramesh

Zihao Zhao

@zihaozhao.bsky.social

🔗 Paper & code
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM

Controlled Generation for Private Synthetic Text

Text anonymization is essential for responsibly developing and deploying AI in high-stakes domains such as healthcare, social services, and law. In this work, we propose a novel methodology for privac...

arxiv.org

October 15, 2025 at 8:24 PM

Reposted by Krithika Ramesh

Niyati Bafna

@niyatibafna.bsky.social

This hypothesis says that 1) Multilingual generation uses a model-internal task-solving→translation cascade. 2) Failure of the translation stage *despite task-solving success* is a large part of the problem. That is, the model often solves the task but fails to articulate the answer.

July 4, 2025 at 5:05 PM

Reposted by Krithika Ramesh

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Go find new linguidtic changes, compare corpora and invent
huggingface.co/Hplm
arxiv.org/abs/2504.05523

Hplm (Historical Perspectival LM)

Org profile for Historical Perspectival LM on Hugging Face, the AI community building the future.

huggingface.co

April 15, 2025 at 12:45 PM

Reposted by Krithika Ramesh

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

Historical analysis is a good example, as historical periods can get lost in blended information from different eras. Finetuning large models isn't enough, they “leak” future/modern concepts, making historical analysis impossible. Did you know cars existed in the 1800s? 🤦

April 15, 2025 at 12:45 PM

Reposted by Krithika Ramesh

Leshem (Legend) Choshen @EMNLP

@lchoshen.bsky.social

arxiv.org/abs/2504.05523

Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.