Website: https://infogrep.it
Online materials: https://catlism.github.io
To a human, it's one. To a corpus tool, it’s often split (😵 + 💫).
And 𝙊𝙉𝙇𝙄𝙉𝙀 ≠ online.
This preprint shows how emojis & homoglyphs challenge tokenisation and distort linguistic evidence.
🔍 arxiv.org/abs/2507.01764
#Emoji #Homoglyphs #CorpusLinguistics #AcademicSky #LangSky
I love this perspective.
Thanks @switchangel.bsky.social
www.youtube.com/live/mLozqDn...
I love this perspective.
Thanks @switchangel.bsky.social
www.youtube.com/live/mLozqDn...
www.project-syndicate.org/magazine/ai-...
vickiboykis.com/what_are_emb...
vickiboykis.com/what_are_emb...
www.theverge.com/report/75681...
www.theverge.com/report/75681...
I made a quick Space to compare VLM OCR with "traditional" OCR using 11k Scottish exam papers from @natlibscot.bsky.social
huggingface.co/spaces/davanstrien/ocr-time-capsule
I made a quick Space to compare VLM OCR with "traditional" OCR using 11k Scottish exam papers from @natlibscot.bsky.social
huggingface.co/spaces/davanstrien/ocr-time-capsule
Here’s how it all falls apart—a 🧵 in 6 figures ⬇️
www.protagonist-science.com/p/how-social...
Here’s how it all falls apart—a 🧵 in 6 figures ⬇️
www.protagonist-science.com/p/how-social...
MBA-brain is real.
MBA-brain is real.
To a human, it's one. To a corpus tool, it’s often split (😵 + 💫).
And 𝙊𝙉𝙇𝙄𝙉𝙀 ≠ online.
This preprint shows how emojis & homoglyphs challenge tokenisation and distort linguistic evidence.
🔍 arxiv.org/abs/2507.01764
#Emoji #Homoglyphs #CorpusLinguistics #AcademicSky #LangSky
To a human, it's one. To a corpus tool, it’s often split (😵 + 💫).
And 𝙊𝙉𝙇𝙄𝙉𝙀 ≠ online.
This preprint shows how emojis & homoglyphs challenge tokenisation and distort linguistic evidence.
🔍 arxiv.org/abs/2507.01764
#Emoji #Homoglyphs #CorpusLinguistics #AcademicSky #LangSky
I have a preprint I'd like to upload to Computer Science > Computation and Language (cs.CL), but need someone to endorse my account.
Here's the endorsement link: arxiv.org/auth/endorse...
#corpuslinguistics #linguistics
I have a preprint I'd like to upload to Computer Science > Computation and Language (cs.CL), but need someone to endorse my account.
Here's the endorsement link: arxiv.org/auth/endorse...
#corpuslinguistics #linguistics
Click on the link below to have a look at the speakers and the workshops of our Summer School!
⬇️⬇️⬇️
www.summerschooldigitalhumanities.unimore.it/2025-edition...
Click on the link below to have a look at the speakers and the workshops of our Summer School!
⬇️⬇️⬇️
www.summerschooldigitalhumanities.unimore.it/2025-edition...
Please retweet.
tinyurl.com/PostdocGNNSNF
Please retweet.
tinyurl.com/PostdocGNNSNF
🎯 Running experiments until you get a hit
🍒 Cherry-picking your results
🔧 Tweaking your data
➗ Not adjusting for multiple comparisons
www.nature.com/articles/d41...
🎯 Running experiments until you get a hit
🍒 Cherry-picking your results
🔧 Tweaking your data
➗ Not adjusting for multiple comparisons
www.nature.com/articles/d41...
- a presentation by Fiona Ramage @fionar.bsky.social
youtu.be/dnzRQPOxz1o?...
The event was organised by Edinburgh ReproducibiliTea
- a presentation by Fiona Ramage @fionar.bsky.social
youtu.be/dnzRQPOxz1o?...
The event was organised by Edinburgh ReproducibiliTea