Armin Pournaki
pournaki.bsky.social
Armin Pournaki
@pournaki.bsky.social
PhD candidate at MPI MiS, Lattice and SciencesPo médialab | Computational Social Science, Narratives, NLP, Complex Networks | https://pournaki.com
That's interesting, intuitively I'd think that you need a certain document length for LDA to reliably capture repeated word co-occurrences as signals for underlying topics... It would be quite useful to do a systematic comparison that doesn't have all the flaws of the above mentioned paper!
December 9, 2025 at 12:21 PM
Also, I've never quite seen the value of stemming for topic modeling, but I'd argue that preprocessing in general is an important part of any method. That's why I wouldn't necessarily copy every preprocessing step when comparing methods, especially when the latter are quite different.
December 9, 2025 at 9:01 AM
I suspect that a good number of citations stem from the fact that the paper "confirms" a common experience in the CSS community: that embedding-based topic models tend to produce more interpretable topics than LDA _on very short texts_.
December 9, 2025 at 8:56 AM
Thanks for sharing!
May 14, 2025 at 9:39 AM