Ben Lee
banner
bcgl.bsky.social
Ben Lee
@bcgl.bsky.social
Assistant Professor @ the University of Washington iSchool | formerly an Innovator in Residence @ Library of Congress | essays in WIRED, Gawker, The New Republic, Current Affairs, etc.

🌐 www.bcglee.com
4/ What does visual search do? Here’s a visual search for “redacted documents”
November 18, 2025 at 8:19 PM
3/ The full GovScape architecture is detailed in this figure, showing how the client interacts with the server, DBs, and indices. We utilize FAISS for text embeddings and for CLIP embeddings, and SQLite FTS5 for keyword indexing.
November 18, 2025 at 8:19 PM
2/ The pre-processing pipeline ingests PDFs, renders them, generates CLIP and BGE embeddings of individual pages, and indexes the text. The total compute cost for GovScape's pre-processing pipeline for 10 million PDFs was approximately $1,500. Our code is available at: github.com/bcglee/govsc....
November 18, 2025 at 8:19 PM
2/ GovScape is built on top of the End of Term Web Archive (eotarchive.org) and currently contains all renderable PDFs (50 pages or fewer) from the 2020 crawl, documenting the first Trump administration. An overview of GovScape’s search functionality can be found in this diagram.
November 18, 2025 at 8:19 PM
The public demo (digital-collections-explorer.com) enables searching over 500,000 map images from the Library of Congress's API. For example, search for "tattered and worn map"
July 2, 2025 at 8:56 PM
With our Digital Collections Explorer, a collection steward can spin up a local viewer with just a few lines of code. An overview of the Digital Collections Explorer architecture can be found in this overview figure.

The full codebase is available here: github.com/hinxcode/dig...
July 2, 2025 at 8:56 PM
Today, I’ll be sharing about the public symposium, “AI and the Future of Holocaust Research and Memory,” hosted at @ischool.uw.edu on Udub’s campus in Seattle. Wonderful to be here with such incredible colleagues across the world and across disciplines.
May 20, 2025 at 5:01 PM
In consultation with the Geography and Map Division at the Library of Congress, we demonstrate the utility of these embeddings for a range of search & discovery tasks, including natural language search, reverse image search, and multimodal search, like this one:
December 3, 2024 at 8:29 PM
In this paper, we introduce CLIP embeddings for these 562,842 images, as well as a dataset of 10,504 map-caption pairs. Here's an overview of our search implementation, which returns results nearly instantaneously on an M3 Macbook Pro:
December 3, 2024 at 8:29 PM