Krithika Ramesh
@stolenpyjak.bsky.social
(she/her)
¯\_(ツ)_/¯
PhD student @jhuclsp | Prev @IndiaMSR
¯\_(ツ)_/¯
PhD student @jhuclsp | Prev @IndiaMSR
SynthTextEval was developed in close collaboration with
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social
Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social
Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!
GitHub - kr-ramesh/synthtexteval: SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration)
SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data Across Domains (EMNLP 2025 System Demonstration) - kr-ramesh/synthtexteval
github.com
November 7, 2025 at 12:53 AM
SynthTextEval was developed in close collaboration with
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social
Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social
Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!
SynthTextEval is a comprehensive toolkit for evaluating synthetic text data with a wide range of metrics, enabling standardized, comparable assessments of generation approaches and building greater confidence in the quality of synthetic data, especially for high-stakes domains
November 7, 2025 at 12:53 AM
SynthTextEval is a comprehensive toolkit for evaluating synthetic text data with a wide range of metrics, enabling standardized, comparable assessments of generation approaches and building greater confidence in the quality of synthetic data, especially for high-stakes domains
Synthetic data shouldn’t be a black box - we make it easier to examine and identify issues in synthetic data outputs with
- Interactive text exploration & review with our GUI tool
- Exploring text diversity, structure and themes with our visual and descriptive text analyses tools
- Interactive text exploration & review with our GUI tool
- Exploring text diversity, structure and themes with our visual and descriptive text analyses tools
November 7, 2025 at 12:53 AM
Synthetic data shouldn’t be a black box - we make it easier to examine and identify issues in synthetic data outputs with
- Interactive text exploration & review with our GUI tool
- Exploring text diversity, structure and themes with our visual and descriptive text analyses tools
- Interactive text exploration & review with our GUI tool
- Exploring text diversity, structure and themes with our visual and descriptive text analyses tools
SynthTextEval also supports fine-tuning models for controllable text generation across diverse domains, which allows users to
- Produce text tailored to user-defined styles, content types, or domain labels
- Generate synthetic data with differentially private guarantees
- Produce text tailored to user-defined styles, content types, or domain labels
- Generate synthetic data with differentially private guarantees
November 7, 2025 at 12:53 AM
SynthTextEval also supports fine-tuning models for controllable text generation across diverse domains, which allows users to
- Produce text tailored to user-defined styles, content types, or domain labels
- Generate synthetic data with differentially private guarantees
- Produce text tailored to user-defined styles, content types, or domain labels
- Generate synthetic data with differentially private guarantees
🔧Utility: Downstream task-based evaluations (classification, coreference resolution)
📊Fairness: distributional balance & representational biases
🔐Privacy: Leakage, memorization, and re-identification risk
📜Quality: Distributional differences between synthetic and real text
📊Fairness: distributional balance & representational biases
🔐Privacy: Leakage, memorization, and re-identification risk
📜Quality: Distributional differences between synthetic and real text
November 7, 2025 at 12:53 AM
🔧Utility: Downstream task-based evaluations (classification, coreference resolution)
📊Fairness: distributional balance & representational biases
🔐Privacy: Leakage, memorization, and re-identification risk
📜Quality: Distributional differences between synthetic and real text
📊Fairness: distributional balance & representational biases
🔐Privacy: Leakage, memorization, and re-identification risk
📜Quality: Distributional differences between synthetic and real text
Conventional metrics like BLEU, ROUGE, or perplexity only scratch the surface of synthetic text quality!
Our framework introduces a multi-dimensional evaluation suite that covers aspects such as utility, privacy, fairness and distributional similarity to the real data.
Our framework introduces a multi-dimensional evaluation suite that covers aspects such as utility, privacy, fairness and distributional similarity to the real data.
November 7, 2025 at 12:53 AM
Conventional metrics like BLEU, ROUGE, or perplexity only scratch the surface of synthetic text quality!
Our framework introduces a multi-dimensional evaluation suite that covers aspects such as utility, privacy, fairness and distributional similarity to the real data.
Our framework introduces a multi-dimensional evaluation suite that covers aspects such as utility, privacy, fairness and distributional similarity to the real data.
Reposted by Krithika Ramesh
Thank you to @anjalief.bsky.social for advising. Hands-on with DP-SGD? Start with our another paper and open-source package
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)
SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains
We present SynthTextEval, a toolkit for conducting comprehensive evaluations of synthetic text. The fluency of large language model (LLM) outputs has made synthetic text potentially viable for numerou...
arxiv.org
October 15, 2025 at 8:24 PM
Thank you to @anjalief.bsky.social for advising. Hands-on with DP-SGD? Start with our another paper and open-source package
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)
Reposted by Krithika Ramesh
🔗 Paper & code
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM
Controlled Generation for Private Synthetic Text
Text anonymization is essential for responsibly developing and deploying AI in high-stakes domains such as healthcare, social services, and law. In this work, we propose a novel methodology for privac...
arxiv.org
October 15, 2025 at 8:24 PM
🔗 Paper & code
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM
Reposted by Krithika Ramesh
This hypothesis says that 1) Multilingual generation uses a model-internal task-solving→translation cascade. 2) Failure of the translation stage *despite task-solving success* is a large part of the problem. That is, the model often solves the task but fails to articulate the answer.
July 4, 2025 at 5:05 PM
This hypothesis says that 1) Multilingual generation uses a model-internal task-solving→translation cascade. 2) Failure of the translation stage *despite task-solving success* is a large part of the problem. That is, the model often solves the task but fails to articulate the answer.
Reposted by Krithika Ramesh
Go find new linguidtic changes, compare corpora and invent
huggingface.co/Hplm
arxiv.org/abs/2504.05523
huggingface.co/Hplm
arxiv.org/abs/2504.05523
Hplm (Historical Perspectival LM)
Org profile for Historical Perspectival LM on Hugging Face, the AI community building the future.
huggingface.co
April 15, 2025 at 12:45 PM
Go find new linguidtic changes, compare corpora and invent
huggingface.co/Hplm
arxiv.org/abs/2504.05523
huggingface.co/Hplm
arxiv.org/abs/2504.05523
Reposted by Krithika Ramesh
Historical analysis is a good example, as historical periods can get lost in blended information from different eras. Finetuning large models isn't enough, they “leak” future/modern concepts, making historical analysis impossible. Did you know cars existed in the 1800s? 🤦
April 15, 2025 at 12:45 PM
Historical analysis is a good example, as historical periods can get lost in blended information from different eras. Finetuning large models isn't enough, they “leak” future/modern concepts, making historical analysis impossible. Did you know cars existed in the 1800s? 🤦
Reposted by Krithika Ramesh
arxiv.org/abs/2504.05523
Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
Pretraining Language Models for Diachronic Linguistic Change Discovery
Large language models (LLMs) have shown potential as tools for scientific discovery. This has engendered growing interest in their use in humanistic disciplines, such as historical linguistics and lit...
arxiv.org
April 15, 2025 at 12:45 PM
arxiv.org/abs/2504.05523
Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
Reposted by Krithika Ramesh
Form here: forms.gle/6DRkaP1CTMYk...
MASC 2025 Call for Locations
Are you able to host MASC this year, sometime in Spring 2025?
Responsibilities include:
Space for ~150 ish people
Managing the review process (really just paper submissions)
Organizing the event
Choo...
forms.gle
December 16, 2024 at 9:26 PM
Form here: forms.gle/6DRkaP1CTMYk...