Ben Lee
banner
bcgl.bsky.social
Ben Lee
@bcgl.bsky.social
Assistant Professor @ the University of Washington iSchool | formerly an Innovator in Residence @ Library of Congress | essays in WIRED, Gawker, The New Republic, Current Affairs, etc.

🌐 www.bcglee.com
Thanks so much! Truly appreciate it!
November 19, 2025 at 7:09 PM
Thanks so much!
November 19, 2025 at 3:09 AM
7/ Lastly, we’d love to hear your feedback on GovScape at [email protected]! For more updates on GovScape, follow: @govscape.bsky.social
November 18, 2025 at 8:19 PM
7/ A particular thank-you to @kdeeds.bsky.social for leading this project with me and for making this possible! And to @yh-huang.bsky.social, who did an incredible job with the front-end and dev-ops!
November 18, 2025 at 8:19 PM
6/ GovScape is the result of a multidisciplinary collaboration, co-led by myself and @kdeeds.bsky.social. We’re enormously grateful to the team: Ying-Hsiang Huang, Claire Gong, Shreya Shaji, Alison Yan, Leslie Harka, @tjowens.bsky.social, @vphill.bsky.social, @shannonshen.bsky.social, and SJ Klein!
November 18, 2025 at 8:19 PM
5/ Interested in learning more? Visit GovScape at: www.govscape.net – try some searches and read the FAQ! You can also watch a demo video here: www.youtube.com/watch?v=mNda...
GovScape: A Tutorial Video
YouTube video by GovScape
www.youtube.com
November 18, 2025 at 8:19 PM
4/ What does visual search do? Here’s a visual search for “redacted documents”
November 18, 2025 at 8:19 PM
3/ The full GovScape architecture is detailed in this figure, showing how the client interacts with the server, DBs, and indices. We utilize FAISS for text embeddings and for CLIP embeddings, and SQLite FTS5 for keyword indexing.
November 18, 2025 at 8:19 PM
2/ The pre-processing pipeline ingests PDFs, renders them, generates CLIP and BGE embeddings of individual pages, and indexes the text. The total compute cost for GovScape's pre-processing pipeline for 10 million PDFs was approximately $1,500. Our code is available at: github.com/bcglee/govsc....
November 18, 2025 at 8:19 PM
2/ GovScape is built on top of the End of Term Web Archive (eotarchive.org) and currently contains all renderable PDFs (50 pages or fewer) from the 2020 crawl, documenting the first Trump administration. An overview of GovScape’s search functionality can be found in this diagram.
November 18, 2025 at 8:19 PM
Thanks so much, I appreciate it!
October 11, 2025 at 9:37 PM
Thanks so much for your kind words - I really appreciate them, and I'm glad that the piece resonated with you! And thank you for your work, too, in having volunteered as a docent and being the family archivist!
October 11, 2025 at 9:37 PM
Thanks so much, Scott - very kind of you to say!
October 1, 2025 at 1:49 AM
I really appreciate your kind words, @dhutchinson.bsky.social!
September 29, 2025 at 5:51 AM
Thanks so much, @jenserventi.bsky.social - I truly appreciate it!
September 29, 2025 at 5:50 AM