scott b. weingart
banner
scottbot.bsky.social
scott b. weingart
@scottbot.bsky.social
past: circus performer; historian of science; librarian; chief data officer at NEH.

present: dad; resident scholar at dartmouth; chief technology officer at the library of virginia.

personal account; views solely my own.

https://scottbot.github.io
There are certainly plenty of folks who just imagine handwriting as a content vehicle, as you say. But having followed and participated in (I think?) the same threads you're referring to, I think most of the people involved would agree entirely with this point and most others you make upthread.
November 30, 2025 at 4:19 AM
Reposted by scott b. weingart
No coincidence that I share Wendit Tnce Inf days after this note about machine reading.

Every painstakingly hand-crafted element of this physical book demands to be read. But reading remains elusive.

To human or machine, this book remains a perfect thing stuck forever in the middle distance.
November 27, 2025 at 1:34 PM
No coincidence that I share Wendit Tnce Inf days after this note about machine reading.

Every painstakingly hand-crafted element of this physical book demands to be read. But reading remains elusive.

To human or machine, this book remains a perfect thing stuck forever in the middle distance.
November 27, 2025 at 1:34 PM
Ugh, Allison, not Alison.

Anyway, best book of the last decade by miles.
November 27, 2025 at 12:31 PM
Reposted by scott b. weingart
Don't pronounce a eulogy on paleographers yet. We'll want them around to understand the data we do have, build more open data, work on languages big companies don't care about, and evaluate when systems go wrong. bsky.app/profile/giul...
November 26, 2025 at 3:15 PM
Reposted by scott b. weingart
which I know from personal inspection. What it had was the biggest (n-gram) language model anyone had yet built. @nsaphra.bsky.social et al. have a nice paper on this analogy. arxiv.org/abs/2311.05020
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Many NLP researchers are experiencing an existential crisis triggered by the astonishing success of ChatGPT and other systems based on large language models (LLMs). After such a disruptive change to o...
arxiv.org
November 26, 2025 at 3:15 PM
Reposted by scott b. weingart
As some of the replies to @dancohen.org have pointed out, these OCR capabilities are especially impressive for more recent English, where the language model is strongest. Probably the best analogy for this is early Google Translate, which had a pretty weak translation model, ...
November 26, 2025 at 3:15 PM
Reposted by scott b. weingart
The pile of data that made Gemini's OCR possible was produced by past research! We know examples of OCR/HTR training sets that Google certainly used, so funding them was certainly helpful. bsky.app/profile/scot...
I know this is the funding/research game, and we put a lot of money/time into soon-curtailed paths because one payoff is sometimes all we need, but: it's sobering thinking of all the clever technologies and methodologies that were swept away when fundamentally stupid LLMs came on the scene.
November 26, 2025 at 3:15 PM
Sometimes yes but often no. As I say downthread, "perfect" in this context refers to specific genres of documents.

Also, yes, absolutely the process is as important as the end product. (See e.g., below thread). I'm referring to consequences rather than most ideal paths forward.
In my brief time at the Library of Virginia, it seems some of the most active and consistent engagement with our collections by non-academics is in collective transcription events. People get really excited, learn a lot, and feel connected to the past! Impossible to overstate the importance of that.
November 26, 2025 at 12:02 PM
For anyone following along, there's a second thread that goes pretty in depth continuing this conversation with several folks involved in organizing hand transcriptions:
I know this is the funding/research game, and we put a lot of money/time into soon-curtailed paths because one payoff is sometimes all we need, but: it's sobering thinking of all the clever technologies and methodologies that were swept away when fundamentally stupid LLMs came on the scene.
November 26, 2025 at 11:58 AM
A professor required this of me during my first semester of grad school and it may be the most (of many) important things I took from that class.
November 25, 2025 at 9:18 PM
As if on cue: bsky.app/profile/benw...

Obviously I agree with you @snblickhan.bsky.social regarding the importance of e.g., transcription curiosity and experiential learning, but consequentiality and importance aren't always aligned.
Sure enough, we got this email from a volunteer on Friday after we announced Gemini integration:

"As a long-term transcriber on From the page I am now wondering about the implications of Gemini 3 (AI) for me - I am feeling particularly discouraged today.
November 25, 2025 at 9:02 PM
Whether we want to use these automated transcriptions, think they're ethical, or whatever, I think we're already there w/r/t immediate consequences. They're already used by individuals, genealogy companies, etc.

(Ref'ing my other thread for people trying to follow this multi-headed conversation.)
Nearly-perfect printed and handwritten text recognition is the most consequential technical contribution to the study of human culture of the last fifteen years, and it's not even close.

It fundamentally changes our (both lay and expert) relationship with the written past.
New issue of my newsletter: "The Writing Is on the Wall for Handwriting Recognition" — One of the hardest problems in digital humanities has finally been solved, and it's a good use of AI newsletter.dancohen.org/archive/the-...
November 25, 2025 at 8:59 PM
Yeah this is going to cause us (cultural heritage folks, among others) so many headaches in a few years, of the nuclear-fallout-contaminated steel variety. Really not looking forward to that reckoning.
November 25, 2025 at 8:40 PM
I've found confabulations and guesswork can be reduced significantly by role-based prompting, but even still I wouldn't use Gemini(/etc.)-created transcriptions for anything outside of information retrieval indices, or helping me with paleography when I have the image up alongside the transcription.
November 25, 2025 at 8:36 PM
That said, knowing what we do about the success of machine transcription, I wonder what new and interesting ways we can have folks connect with our collections. I don't imagine these transcription events will disappear, but this offers the opportunity to imagine new mutually beneficial entry points.
November 25, 2025 at 8:29 PM
In my brief time at the Library of Virginia, it seems some of the most active and consistent engagement with our collections by non-academics is in collective transcription events. People get really excited, learn a lot, and feel connected to the past! Impossible to overstate the importance of that.
November 25, 2025 at 8:26 PM
Yeah, I chose "soon-curtailed" vs. "dead end" for that reason. Earlier transcription technologies were valuable for (among other things) their use at the time and how they bootstrapped more recent tech, and human/crowd transcriptions are of course still incredibly important.

Still sobering though!
November 25, 2025 at 8:08 PM
I know this is the funding/research game, and we put a lot of money/time into soon-curtailed paths because one payoff is sometimes all we need, but: it's sobering thinking of all the clever technologies and methodologies that were swept away when fundamentally stupid LLMs came on the scene.
November 25, 2025 at 7:35 PM