Lightnews — Scholar-powered news

Dallas Card

@dallascard.bsky.social

2.7K followers 380 following 100 posts

Assistant professor at https://si.umich.edu/ working in computational social science, machine learning, and NLP | https://dallascard.github.io

Posts Replies Media Videos

Dallas Card

@dallascard.bsky.social

See also @manoelhortaribeiro.bsky.social's post on this same topic: doomscrollingbabel.manoel.xyz/p/labeling-d...

Labeling Data with Language Models: Trick or Treat?

Large language models are now labeling data for us.

doomscrollingbabel.manoel.xyz

November 19, 2025 at 3:44 PM

Dallas Card

@dallascard.bsky.social

Ah yes, good point! That's a careless misuse of language on my part. What I meant is that I think the resulting estimate should be correct in expectation (with respect to the sample of labeled/unlabeled data), regardless of the amount of labeled data, but with lower variance for larger samples.

November 18, 2025 at 4:25 PM

Dallas Card

@dallascard.bsky.social

I think you're right, although I also cynically expect that a unique first author requirement would lead to a lot of fake first authors (depending on the venue) 😅

November 15, 2025 at 11:22 PM

Dallas Card

@dallascard.bsky.social

I'll also selfishly highlight my own earlier work on quality filtering: aclanthology.org/2022.emnlp-m...

Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

Suchin Gururangan, Dallas Card, Sarah Dreier, Emily Gade, Leroy Wang, Zeyu Wang, Luke Zettlemoyer, Noah A. Smith. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing...

aclanthology.org

November 14, 2025 at 10:13 PM

Dallas Card

@dallascard.bsky.social

Excellent thread! This also reminds me of David Bamman's work on films as data, which, if I understand correctly, you *are* legally allowed to use for research, as long as you own and retain a physical copy, and as long as you don't enjoy watching it : )

www.pnas.org/doi/abs/10.1...

Measuring diversity in Hollywood through the large-scale computational analysis of film | PNAS

Movies are a massively popular and influential form of media, but their computational study at scale has largely been off-limits to researchers in ...

www.pnas.org

November 14, 2025 at 10:12 PM

Dallas Card

@dallascard.bsky.social

I wonder what would happen if a major conference made a rule that each author is only allowed to submit one paper per cycle? Obviously total submissions would be much smaller, and many papers would be redirected elsewhere, but could they convince people to only send their best work?

November 14, 2025 at 10:00 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news