scott b. weingart
banner
scottbot.bsky.social
scott b. weingart
@scottbot.bsky.social
past: circus performer; historian of science; librarian; chief data officer at NEH.

present: dad; resident scholar at dartmouth; chief technology officer at the library of virginia.

personal account; views solely my own.

https://scottbot.github.io
Pinned
📌Hi! I'm Scott, a historian of science.

Before DOGE, I helped the US fund the humanities efficiently and impactfully, to reach the breadth of the American public.

Now I help make the Library of Virginia's rich collections and services digitally accessible to all.

Personal account, mostly silly.📌
This is your irregular reminder that Ben Affleck played a(n uncredited) basketball player in the 1992 Buffy the Vampire Slayer movie.
November 29, 2025 at 7:16 PM
Reposted by scott b. weingart
No coincidence that I share Wendit Tnce Inf days after this note about machine reading.

Every painstakingly hand-crafted element of this physical book demands to be read. But reading remains elusive.

To human or machine, this book remains a perfect thing stuck forever in the middle distance.
November 27, 2025 at 1:34 PM
Reposted by scott b. weingart
I have two books, but instead of promoting them, I will promote someone else's.

Alison Parish's "Wendit Tnce Inf", a book of poems that can never be read—in two ways:
1. The letters are almost but not quite real.
2. The book (56 copies printed) is sold out.

www.aleator.press/releases/wen...
November 27, 2025 at 12:30 PM
I have two books, but instead of promoting them, I will promote someone else's.

Alison Parish's "Wendit Tnce Inf", a book of poems that can never be read—in two ways:
1. The letters are almost but not quite real.
2. The book (56 copies printed) is sold out.

www.aleator.press/releases/wen...
November 27, 2025 at 12:30 PM
Reminder: deadline next Monday.

Apply! Apply! Apply!
The next Historical Network Research conference will be held in Turin, Italy in July 2026, and submissions are now open.

Proposals due December 1. Bursaries available for early career scholars.

This year's theme is "Networks and their Sources." See you there!

hnr2026.sciencesconf.org
HNR2026: The Historical Network Research Conference 2026 (Turin, Italy) - Sciencesconf.org
Call for Papers – Historical Network Research Conference 2026
hnr2026.sciencesconf.org
November 27, 2025 at 12:10 PM
Reposted by scott b. weingart
Don't pronounce a eulogy on paleographers yet. We'll want them around to understand the data we do have, build more open data, work on languages big companies don't care about, and evaluate when systems go wrong. bsky.app/profile/giul...
November 26, 2025 at 3:15 PM
Reposted by scott b. weingart
which I know from personal inspection. What it had was the biggest (n-gram) language model anyone had yet built. @nsaphra.bsky.social et al. have a nice paper on this analogy. arxiv.org/abs/2311.05020
First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models
Many NLP researchers are experiencing an existential crisis triggered by the astonishing success of ChatGPT and other systems based on large language models (LLMs). After such a disruptive change to o...
arxiv.org
November 26, 2025 at 3:15 PM
Reposted by scott b. weingart
As some of the replies to @dancohen.org have pointed out, these OCR capabilities are especially impressive for more recent English, where the language model is strongest. Probably the best analogy for this is early Google Translate, which had a pretty weak translation model, ...
November 26, 2025 at 3:15 PM
Reposted by scott b. weingart
The pile of data that made Gemini's OCR possible was produced by past research! We know examples of OCR/HTR training sets that Google certainly used, so funding them was certainly helpful. bsky.app/profile/scot...
I know this is the funding/research game, and we put a lot of money/time into soon-curtailed paths because one payoff is sometimes all we need, but: it's sobering thinking of all the clever technologies and methodologies that were swept away when fundamentally stupid LLMs came on the scene.
November 26, 2025 at 3:15 PM
Reposted by scott b. weingart
Dan's post is great, as usual! It's exciting to move to full-page models that allow us to engage with the text and avoid mucking around with image processing. But we don't need to pronounce a eulogy on paleographers yet. The War Department and Jane Austen examples are likely known to the LM. 1/
New issue of my newsletter: "The Writing Is on the Wall for Handwriting Recognition" — One of the hardest problems in digital humanities has finally been solved, and it's a good use of AI newsletter.dancohen.org/archive/the-...
The Writing Is on the Wall for Handwriting Recognition
One of the hardest problems in digital humanities has finally been solved
newsletter.dancohen.org
November 26, 2025 at 3:15 PM
Wait wait wait msnbc/msnow hasn't been associated with microsoft since 2005???
November 25, 2025 at 11:30 PM
Reposted by scott b. weingart
It was a remarkable feeling, working on a recent project, to have access to software that transcribed the marginalia as well as the printed text.
Nearly-perfect printed and handwritten text recognition is the most consequential technical contribution to the study of human culture of the last fifteen years, and it's not even close.

It fundamentally changes our (both lay and expert) relationship with the written past.
New issue of my newsletter: "The Writing Is on the Wall for Handwriting Recognition" — One of the hardest problems in digital humanities has finally been solved, and it's a good use of AI newsletter.dancohen.org/archive/the-...
November 25, 2025 at 7:19 PM
Reposted by scott b. weingart
Now this is an overview worth reading #skystorians. I’ve been running table models with pre-trained German language models (with pretty high CER) and it still took my total data entry time down at least 60-70%.
Nearly-perfect printed and handwritten text recognition is the most consequential technical contribution to the study of human culture of the last fifteen years, and it's not even close.

It fundamentally changes our (both lay and expert) relationship with the written past.
New issue of my newsletter: "The Writing Is on the Wall for Handwriting Recognition" — One of the hardest problems in digital humanities has finally been solved, and it's a good use of AI newsletter.dancohen.org/archive/the-...
November 25, 2025 at 7:06 PM
Nearly-perfect printed and handwritten text recognition is the most consequential technical contribution to the study of human culture of the last fifteen years, and it's not even close.

It fundamentally changes our (both lay and expert) relationship with the written past.
New issue of my newsletter: "The Writing Is on the Wall for Handwriting Recognition" — One of the hardest problems in digital humanities has finally been solved, and it's a good use of AI newsletter.dancohen.org/archive/the-...
The Writing Is on the Wall for Handwriting Recognition
One of the hardest problems in digital humanities has finally been solved
newsletter.dancohen.org
November 25, 2025 at 6:14 PM
Reposted by scott b. weingart
MajinBook is a badly-needed catalog for shadow libraries. It provides metadata (e.g., date of first publication, popularity on Goodreads) for over half a million English-language books. arxiv.org/abs/2511.11412 +
MajinBook: An open catalogue of digital world literature with likes
This data paper introduces MajinBook, an open catalogue designed to facilitate the use of shadow libraries--such as Library Genesis and Z-Library--for computational social science and cultural analyti...
arxiv.org
November 21, 2025 at 2:24 PM
Reposted by scott b. weingart
“Blogs are one of the great literary inventions of our time. Coming somewhere between an essay and a diary entry, they are a form of personal journalism that is intimate and immediate... but they also have… a rough-edged informality that breaks down barriers. They are engaging.”
November 12, 2025 at 11:22 AM
Reposted by scott b. weingart
We are so proud of this work. Not only is it the first effort to publish & analyze **open-access data** derived from the entire text contents of digitized @britishlibrary.bsky.social newspapers, it presents a metadata-driven approach to understanding bias in big historical data. #dh #skystorians
November 11, 2025 at 5:07 PM
Reposted by scott b. weingart
!Stop Press! Article on bias in digitised newspaper collections: ’Whose News’, in the new journal of @comphumresearch.bsky.social by Kaspar Beelen, @jonhistorian61.bsky.social, @kmcdono.bsky.social and me. See blog for summary & 🧵 1/7

Article doi.org/10.1017/chr....

Blog is.gd/2IFc30

#dh #c19 🗃️
Whose news? Critical methods for assessing bias in large historical datasets | Computational Humanities Research | Cambridge Core
Whose news? Critical methods for assessing bias in large historical datasets - Volume 1
doi.org
November 11, 2025 at 4:05 PM
Reposted by scott b. weingart
Great news! This is out: Opening the black box of EEBO academic.oup.com/dsh/advance-...
Opening the black box of EEBO
Abstract. Digital archives that cover extended historical periods can create a misleading impression of comprehensiveness while in truth providing access t
academic.oup.com
November 9, 2025 at 10:30 AM
Reposted by scott b. weingart
“And then you have librarians who are experiencing a real existential crisis because they are getting asked by their jobs to promote [AI] tools that produce more misinformation. It's the most, like, emperor-has-no-clothes-type situation that I have ever witnessed.” - Alison Macrina
AI Is Supercharging the War on Libraries, Education, and Human Knowledge
"Fascism and AI, whether or not they have the same goals, they sure are working to accelerate one another."
www.404media.co
November 7, 2025 at 7:15 AM
Reposted by scott b. weingart
How do we navigate gaps and challenges in quantifying and understanding innovation and R&D in the creative sector? Excellent roundup from @suzannerblack.bsky.social on how difficult it is to count and map innovation, and "return on funder investment", in the creative industries.
November 7, 2025 at 12:03 PM
Reposted by scott b. weingart
The first (1955) Danish edition of Ray Bradbury’s FAHRENHEIT 451. Later editions did not convert the title, so this is the only SI-compatible edition! 🎢
November 1, 2025 at 7:55 PM
Reposted by scott b. weingart
Haven't seen any linguistics research on character limits, but perhaps someone who follows me might know of some!
Hey, @gretchenmcc.bsky.social, sorry to at you like this out of the blue, but couldn't think of anyone better to ask.

Are you aware of any research looking at how character limits influence word choice on social media?
November 1, 2025 at 11:10 PM
Reposted by scott b. weingart
A permanent post in my department. Closing date Dec 14th 2025, interviews in March. Please spread #histsci
Assistant Professor in History of Knowledge Pre-1400
Applications are invited for the position of Assistant Professor in History of Knowledge Pre-1400, in the Department of History and Philosophy of Science at the University of Cambridge. Please note
www.cam.ac.uk
October 30, 2025 at 10:07 AM
Reposted by scott b. weingart
As DH grows, it’s increasingly important to publish conference papers, but there hasn’t been a clear venue for that.

So I’m thrilled to share this new home for DH proceedings, which will include CHR papers & more.

Thanks to @taylor-arnold.bsky.social for leading this effort!

bit.ly/ach-anthology
October 29, 2025 at 3:39 PM