Lightnews — Scholar-powered news

scott b. weingart

@scottbot.bsky.social

There are certainly plenty of folks who just imagine handwriting as a content vehicle, as you say. But having followed and participated in (I think?) the same threads you're referring to, I think most of the people involved would agree entirely with this point and most others you make upthread.

November 30, 2025 at 4:19 AM

Reposted by scott b. weingart

scott b. weingart

@scottbot.bsky.social

No coincidence that I share Wendit Tnce Inf days after this note about machine reading.

Every painstakingly hand-crafted element of this physical book demands to be read. But reading remains elusive.

To human or machine, this book remains a perfect thing stuck forever in the middle distance.

November 27, 2025 at 1:34 PM

scott b. weingart

@scottbot.bsky.social

No coincidence that I share Wendit Tnce Inf days after this note about machine reading.

Every painstakingly hand-crafted element of this physical book demands to be read. But reading remains elusive.

To human or machine, this book remains a perfect thing stuck forever in the middle distance.

November 27, 2025 at 1:34 PM

scott b. weingart

@scottbot.bsky.social

Ugh, Allison, not Alison.

Anyway, best book of the last decade by miles.

November 27, 2025 at 12:31 PM

Reposted by scott b. weingart

David Smith

@dasmiq.bsky.social

Don't pronounce a eulogy on paleographers yet. We'll want them around to understand the data we do have, build more open data, work on languages big companies don't care about, and evaluate when systems go wrong. bsky.app/profile/giul...

Giulia Taurino @giuliataurino.bsky.social · 16d

arxiv.org/abs/2503.15195

Benchmarking Large Language Models for Handwritten Text Recognition

Traditional machine learning models for Handwritten Text Recognition (HTR) rely on supervised training, requiring extensive manual annotations, and often produce errors due to the separation between l...

arxiv.org

November 26, 2025 at 3:15 PM

Reposted by scott b. weingart

David Smith

@dasmiq.bsky.social

which I know from personal inspection. What it had was the biggest (n-gram) language model anyone had yet built. @nsaphra.bsky.social et al. have a nice paper on this analogy. arxiv.org/abs/2311.05020

First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models

Many NLP researchers are experiencing an existential crisis triggered by the astonishing success of ChatGPT and other systems based on large language models (LLMs). After such a disruptive change to o...

arxiv.org

November 26, 2025 at 3:15 PM

Reposted by scott b. weingart

David Smith

@dasmiq.bsky.social

As some of the replies to @dancohen.org have pointed out, these OCR capabilities are especially impressive for more recent English, where the language model is strongest. Probably the best analogy for this is early Google Translate, which had a pretty weak translation model, ...

November 26, 2025 at 3:15 PM

Reposted by scott b. weingart

David Smith

@dasmiq.bsky.social

The pile of data that made Gemini's OCR possible was produced by past research! We know examples of OCR/HTR training sets that Google certainly used, so funding them was certainly helpful. bsky.app/profile/scot...

scott b. weingart @scottbot.bsky.social · 4d

I know this is the funding/research game, and we put a lot of money/time into soon-curtailed paths because one payoff is sometimes all we need, but: it's sobering thinking of all the clever technologies and methodologies that were swept away when fundamentally stupid LLMs came on the scene.

November 26, 2025 at 3:15 PM

scott b. weingart

@scottbot.bsky.social

Sometimes yes but often no. As I say downthread, "perfect" in this context refers to specific genres of documents.

Also, yes, absolutely the process is as important as the end product. (See e.g., below thread). I'm referring to consequences rather than most ideal paths forward.

scott b. weingart @scottbot.bsky.social · 4d

In my brief time at the Library of Virginia, it seems some of the most active and consistent engagement with our collections by non-academics is in collective transcription events. People get really excited, learn a lot, and feel connected to the past! Impossible to overstate the importance of that.

November 26, 2025 at 12:02 PM

scott b. weingart

@scottbot.bsky.social

For anyone following along, there's a second thread that goes pretty in depth continuing this conversation with several folks involved in organizing hand transcriptions:

scott b. weingart @scottbot.bsky.social · 4d

I know this is the funding/research game, and we put a lot of money/time into soon-curtailed paths because one payoff is sometimes all we need, but: it's sobering thinking of all the clever technologies and methodologies that were swept away when fundamentally stupid LLMs came on the scene.

November 26, 2025 at 11:58 AM

scott b. weingart

@scottbot.bsky.social

A professor required this of me during my first semester of grad school and it may be the most (of many) important things I took from that class.

November 25, 2025 at 9:18 PM

scott b. weingart

@scottbot.bsky.social

As if on cue: bsky.app/profile/benw...

Obviously I agree with you @snblickhan.bsky.social regarding the importance of e.g., transcription curiosity and experiential learning, but consequentiality and importance aren't always aligned.

Ben Brumfield @benwbrum.bsky.social · 4d

Sure enough, we got this email from a volunteer on Friday after we announced Gemini integration:

"As a long-term transcriber on From the page I am now wondering about the implications of Gemini 3 (AI) for me - I am feeling particularly discouraged today.

November 25, 2025 at 9:02 PM

scott b. weingart

@scottbot.bsky.social

Whether we want to use these automated transcriptions, think they're ethical, or whatever, I think we're already there w/r/t immediate consequences. They're already used by individuals, genealogy companies, etc.

(Ref'ing my other thread for people trying to follow this multi-headed conversation.)

scott b. weingart @scottbot.bsky.social · 4d

Nearly-perfect printed and handwritten text recognition is the most consequential technical contribution to the study of human culture of the last fifteen years, and it's not even close.

It fundamentally changes our (both lay and expert) relationship with the written past.

Dan Cohen @dancohen.org · 4d

New issue of my newsletter: "The Writing Is on the Wall for Handwriting Recognition" — One of the hardest problems in digital humanities has finally been solved, and it's a good use of AI newsletter.dancohen.org/archive/the-...

November 25, 2025 at 8:59 PM

scott b. weingart

@scottbot.bsky.social

Yeah this is going to cause us (cultural heritage folks, among others) so many headaches in a few years, of the nuclear-fallout-contaminated steel variety. Really not looking forward to that reckoning.

November 25, 2025 at 8:40 PM

scott b. weingart

@scottbot.bsky.social

I've found confabulations and guesswork can be reduced significantly by role-based prompting, but even still I wouldn't use Gemini(/etc.)-created transcriptions for anything outside of information retrieval indices, or helping me with paleography when I have the image up alongside the transcription.

November 25, 2025 at 8:36 PM

scott b. weingart

@scottbot.bsky.social

That said, knowing what we do about the success of machine transcription, I wonder what new and interesting ways we can have folks connect with our collections. I don't imagine these transcription events will disappear, but this offers the opportunity to imagine new mutually beneficial entry points.

November 25, 2025 at 8:29 PM

scott b. weingart

@scottbot.bsky.social

In my brief time at the Library of Virginia, it seems some of the most active and consistent engagement with our collections by non-academics is in collective transcription events. People get really excited, learn a lot, and feel connected to the past! Impossible to overstate the importance of that.

November 25, 2025 at 8:26 PM

scott b. weingart

@scottbot.bsky.social

Yeah, I chose "soon-curtailed" vs. "dead end" for that reason. Earlier transcription technologies were valuable for (among other things) their use at the time and how they bootstrapped more recent tech, and human/crowd transcriptions are of course still incredibly important.

Still sobering though!

November 25, 2025 at 8:08 PM

scott b. weingart

@scottbot.bsky.social

I know this is the funding/research game, and we put a lot of money/time into soon-curtailed paths because one payoff is sometimes all we need, but: it's sobering thinking of all the clever technologies and methodologies that were swept away when fundamentally stupid LLMs came on the scene.

November 25, 2025 at 7:35 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news