Dong Nguyen
@dongng.bsky.social
NLP @ Utrecht University (NL) | https://www.dongnguyen.nl/ | NLP & Society Lab: https://nlpsoc.github.io/
New opinion paper out with Esther Ploeger (Aalborg University): We Need to Measure Data Diversity in NLP — Better and Broader at #EMNLP2025 (main) aclanthology.org/2025.emnlp-m...
We Need to Measure Data Diversity in NLP — Better and Broader
Dong Nguyen, Esther Ploeger. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
aclanthology.org
November 4, 2025 at 3:43 PM
New opinion paper out with Esther Ploeger (Aalborg University): We Need to Measure Data Diversity in NLP — Better and Broader at #EMNLP2025 (main) aclanthology.org/2025.emnlp-m...
Reposted by Dong Nguyen
How do language models memorize noise while reason impressively well?
Our #EMNLP2025 (poster, Nov 5, 11:00-12:30, Hall C) paper shows that memorization reuses internal mechanisms of generalization, even when they are not related to each other!
arxiv.org/abs/2507.04782
Our #EMNLP2025 (poster, Nov 5, 11:00-12:30, Hall C) paper shows that memorization reuses internal mechanisms of generalization, even when they are not related to each other!
arxiv.org/abs/2507.04782
November 1, 2025 at 5:25 PM
How do language models memorize noise while reason impressively well?
Our #EMNLP2025 (poster, Nov 5, 11:00-12:30, Hall C) paper shows that memorization reuses internal mechanisms of generalization, even when they are not related to each other!
arxiv.org/abs/2507.04782
Our #EMNLP2025 (poster, Nov 5, 11:00-12:30, Hall C) paper shows that memorization reuses internal mechanisms of generalization, even when they are not related to each other!
arxiv.org/abs/2507.04782
Congrats Anna!! 🎉
I successfully defended my PhD in Dutch fashion and required a PhD certificate in Latin. Thank you to the amazing people that got me here, a.o. @dongng.bsky.social and the ones I blur here.
October 24, 2025 at 6:43 AM
Congrats Anna!! 🎉
Reposted by Dong Nguyen
Please share!
We have a number of fully funded PhD studentships in "Designing Responsible Natural Language Processing". I'm a possible supervisor & I'd be keen to support projects on sociolinguistics-AI, e.g., accent bias in AI, language+gender/sexuality+AI.
www.responsiblenlp.org
We have a number of fully funded PhD studentships in "Designing Responsible Natural Language Processing". I'm a possible supervisor & I'd be keen to support projects on sociolinguistics-AI, e.g., accent bias in AI, language+gender/sexuality+AI.
www.responsiblenlp.org
Our CDT is based in the Edinburgh Futures Institute – the University of Edinburgh’s brand new hub for research, innovation and teaching focused on socially just artificial intelligence and data.
www.responsiblenlp.org
October 10, 2025 at 3:03 PM
Please share!
We have a number of fully funded PhD studentships in "Designing Responsible Natural Language Processing". I'm a possible supervisor & I'd be keen to support projects on sociolinguistics-AI, e.g., accent bias in AI, language+gender/sexuality+AI.
www.responsiblenlp.org
We have a number of fully funded PhD studentships in "Designing Responsible Natural Language Processing". I'm a possible supervisor & I'd be keen to support projects on sociolinguistics-AI, e.g., accent bias in AI, language+gender/sexuality+AI.
www.responsiblenlp.org
Reposted by Dong Nguyen
Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/sl...
Speech and Language Processing
Speech and Language Processing
web.stanford.edu
August 24, 2025 at 7:28 PM
Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/sl...
Reposted by Dong Nguyen
together with some Utrecht NLP people at ACL 2025! #acl2025 #acl2025NLP
July 27, 2025 at 7:48 PM
together with some Utrecht NLP people at ACL 2025! #acl2025 #acl2025NLP
Reposted by Dong Nguyen
Wanna do some authorship attribution? Chances are what tokenizer you use matters.
Tokenization is Sensitive to Language Variation, probably, more investigation necessary...
📄 ACL Findings paper: arxiv.org/pdf/2502.15343
🧑🏫 @dongng.bsky.social @davidjurgens.bsky.social and myself
See you at ACL!
Tokenization is Sensitive to Language Variation, probably, more investigation necessary...
📄 ACL Findings paper: arxiv.org/pdf/2502.15343
🧑🏫 @dongng.bsky.social @davidjurgens.bsky.social and myself
See you at ACL!
July 17, 2025 at 7:59 AM
Wanna do some authorship attribution? Chances are what tokenizer you use matters.
Tokenization is Sensitive to Language Variation, probably, more investigation necessary...
📄 ACL Findings paper: arxiv.org/pdf/2502.15343
🧑🏫 @dongng.bsky.social @davidjurgens.bsky.social and myself
See you at ACL!
Tokenization is Sensitive to Language Variation, probably, more investigation necessary...
📄 ACL Findings paper: arxiv.org/pdf/2502.15343
🧑🏫 @dongng.bsky.social @davidjurgens.bsky.social and myself
See you at ACL!
Reposted by Dong Nguyen
The worst happened. We were DOGE’d. Our NSF funding is gone.
So now there’s nothing stopping me from sharing Expert Voices Together, a crisis response system for US-based researchers and journalists facing harassment.
It's a true passion project. 🧵 1/
expertvoicestogether.org
So now there’s nothing stopping me from sharing Expert Voices Together, a crisis response system for US-based researchers and journalists facing harassment.
It's a true passion project. 🧵 1/
expertvoicestogether.org
May 13, 2025 at 4:22 PM
The worst happened. We were DOGE’d. Our NSF funding is gone.
So now there’s nothing stopping me from sharing Expert Voices Together, a crisis response system for US-based researchers and journalists facing harassment.
It's a true passion project. 🧵 1/
expertvoicestogether.org
So now there’s nothing stopping me from sharing Expert Voices Together, a crisis response system for US-based researchers and journalists facing harassment.
It's a true passion project. 🧵 1/
expertvoicestogether.org
I wrote down some thoughts about what sociolinguistics can contribute to LLMs and vice versa, now available dx.doi.org/10.1111/lnc3...
Collaborative Growth: When Large Language Models Meet Sociolinguistics
Large Language Models (LLMs) have dramatically transformed the AI landscape. They can produce remarkable fluent text and exhibit a range of natural language understanding and generation capabilities....
dx.doi.org
February 4, 2025 at 8:51 AM
I wrote down some thoughts about what sociolinguistics can contribute to LLMs and vice versa, now available dx.doi.org/10.1111/lnc3...
Reposted by Dong Nguyen
🚨BREAKING. From a program officer at the National Science Foundation, a list of keywords that can cause a grant to be pulled. I will be sharing screenshots of these keywords along with a decision tree. Please share widely. This is a crisis for academic freedom & science.
February 4, 2025 at 1:26 AM
🚨BREAKING. From a program officer at the National Science Foundation, a list of keywords that can cause a grant to be pulled. I will be sharing screenshots of these keywords along with a decision tree. Please share widely. This is a crisis for academic freedom & science.
Reposted by Dong Nguyen
Are you a pre-doctoral student interested in language technologies, especially focusing on safe, fair and inclusive AI? Our Summer 2025 Language Technology for All Internship could be a great fit. See the link below for more info, and to apply:
lti.cs.cmu.edu/news-and-eve...
lti.cs.cmu.edu/news-and-eve...
CMU LTI Language Technology for All Internship 2025 - Language Technologies Institute - School of Computer Science - Carnegie Mellon University
The LTI is currently seeking applicants for the summer 2025 Language Technology for All Internship
lti.cs.cmu.edu
January 6, 2025 at 9:23 PM
Are you a pre-doctoral student interested in language technologies, especially focusing on safe, fair and inclusive AI? Our Summer 2025 Language Technology for All Internship could be a great fit. See the link below for more info, and to apply:
lti.cs.cmu.edu/news-and-eve...
lti.cs.cmu.edu/news-and-eve...
Congratulations to dr. Qixiang Fang for successfully defending his impressive thesis on "Leveraging Measurement Theory for Natural Language Processing Research" -- the first PhD student I advised from start to finish. It was an honor to be part of the journey. research-portal.uu.nl/en/publicati...
Leveraging Measurement Theory for Natural Language Processing Research
research-portal.uu.nl
December 6, 2024 at 2:44 PM
Congratulations to dr. Qixiang Fang for successfully defending his impressive thesis on "Leveraging Measurement Theory for Natural Language Processing Research" -- the first PhD student I advised from start to finish. It was an honor to be part of the journey. research-portal.uu.nl/en/publicati...
I'm looking for a PhD student for a new big project on "Data Diversity for Fair and Robust NLP" (📅 Jan 5!) www.uu.nl/en/organisat... #nlproc #nlp
PhD position on Data Diversity for Fair and Robust NLP (DataDivers project)
Dive into data diversity to make Natural Language Processing (NLP) models more fair and robust!
www.uu.nl
November 22, 2024 at 9:48 AM
I'm looking for a PhD student for a new big project on "Data Diversity for Fair and Robust NLP" (📅 Jan 5!) www.uu.nl/en/organisat... #nlproc #nlp