Lightnews — Scholar-powered news

Reposted by Heather Froehlich

Dan Cohen

@dancohen.org

New issue of my newsletter: "The Writing Is on the Wall for Handwriting Recognition" — One of the hardest problems in digital humanities has finally been solved, and it's a good use of AI newsletter.dancohen.org/archive/the-...

The Writing Is on the Wall for Handwriting Recognition

One of the hardest problems in digital humanities has finally been solved

newsletter.dancohen.org

November 25, 2025 at 4:35 PM

Reposted by Heather Froehlich

nilay patel

@reckless.bsky.social

I’ve been running around asking tech execs and academics if language was the same as intelligence for over a year now - and, well, it isn’t. @benjaminjriley.bsky.social explains how the bubble is built on ignoring cutting-edge research into the science of thought www.theverge.com/ai-artificia...

November 25, 2025 at 1:54 PM

Reposted by Heather Froehlich

Kate Starbird

@katestarbird.bsky.social

The UW Center for an Informed Public is looking for postdocs (for 2026-2028) from across diverse disciplines whose research sheds light on the challenges of our modern information environment, promotes civic health, and/or helps people/communities navigate online spaces: apply.interfolio.com/177901

Apply - Interfolio {{$ctrl.$state.data.pageTitle}} - Apply - Interfolio

apply.interfolio.com

November 24, 2025 at 6:26 PM

Reposted by Heather Froehlich

Robert “The Baste God” McNees

@mcnees.bsky.social

Journalist challenge: Use “Machine Learning” when you mean machine learning and “LLM” when you mean LLM. Ditch “AI” as a catch-all term, it’s not useful for readers and it helps companies trying to confuse the public by obscuring the roles played by different technologies. 🧪

November 22, 2025 at 4:50 PM

Reposted by Heather Froehlich

Ruha Benjamin

@ruha9.bsky.social

I need everyone, esp anyone working in education or tech (but really everyone) to WATCH THIS CLIP of @drtanksley.bsky.social discussing the technologies infiltrating our schools & psyches and how she is addressing it with our young people. youtu.be/5mtcSL4S3HQ

Howard University AI Panel

YouTube video by Tiera Tanksley

youtu.be

November 22, 2025 at 1:43 PM

Reposted by Heather Froehlich

Amanda Wyatt! Visconti

@literaturegeek.bsky.social

Seth Rockman: 'Experiential research as a method for IDing questions you didn’t know to ask, rather than providing answers you wouldn’t have had...an experiential research method lets us envision more things we want to know about the past' (experiential=do something the way it was done in the past)

Screenshot of text from an interview answer by Seth Rockman: " SR: There are so many problems with transcending our subjectivities; so many difficulties when encountering the violence that structured a past that can't and shouldn't be reproduced or approximated. And that creates a real barrier.
But at the same time, as historians, we are committed to inhabiting other
bodies and living in other times and making sense of universes that don't make
sense in our own. Our craft requires imagining ourselves across differences of
cosmology and spirituality, of material conditions, of the day-to-day experiences of
survival. But this gets ever more difficult when our subjects lived on the margins of
the societies whose structures of exploitation and exclusion endure to the present
day.
That's why I've been thinking about experiential research as a method for
identifying questions that you didn't know to ask, rather than providing answers that
you wouldn't have had otherwise. For example, I've been doing some dyeing
projects with my undergraduates in a class called "A Textile History of Atlantic
Slavery," and what we stress in that class is, No, you are not pretending to be a West
African woman in 1774. No, you're not pretending to be an enslaved woman on a
South Carolina plantation in 1854. You are yourself, you are now, and you're up to
your elbows in a pot of really cold water. When students are toting buckets of water
to rinse their fabrics, they realize that water is very heavy. And the questions then
follow: How did people in the past move water? How did their backs and bodies
experience this work? Who was doing this kind of work? We are then prompted to
ask questions about infrastructure, about bodily experience, about the gendered
division of labor.
Once we're asking these kinds of questions, then we can go searching for the answers. An archival record might not prompt us to pose these questions. An experiential research method allows us to envision more things that we…

November 22, 2025 at 1:32 PM

Reposted by Heather Froehlich

Tyler Shoemaker

@t-shoemaker.bsky.social

Here's a short thing on adversarial language, following yesterday's poetry news. It argues for interpretability work undertaken via literary studies and tries to acknowledge some difficulties this would entail.

For Those Who May Find Themselves on the Red Team: tylershoemaker.info/docs/shoemak...

November 21, 2025 at 8:39 PM

Heather Froehlich

@heatherfro.bsky.social

Funding: @bloomberg.com is pleased to announce the 2026-2027 edition of the Bloomberg Data Science Ph.D. Fellowship Program www.techatbloomberg.com/bloomberg-da...

Data Science Ph.D. Fellowship | Bloomberg LP

Apply now for the Bloomberg Data Science Ph.D. Fellowship program. Applications are due by April 28, 2023 for the 2023-2024 academic year.

www.techatbloomberg.com

November 21, 2025 at 3:56 PM

Reposted by Heather Froehlich

Anne Trubek

@atrubek.bsky.social

For those of you curious about folks who are not enthralled with AI companies but who also want to preserve broad fair use, free exchange of ideas/citation/building on what’s come before, plus are anti-monopoly, this group is good, and this newsletter interesting

Suno, Yout, Perplexity AI and §1201: AI Training and another piece of the DMCA

“No person shall circumvent a technological measure that effectively controls access to a work protected under this title.” 17 U.S.C.

open.substack.com

November 21, 2025 at 3:23 PM

Reposted by Heather Froehlich

Stan Carey

@stancarey.bsky.social

The word SNEEZE used to be FNESE, as in "He speketh in his nose And fneseth faste" (Canterbury Tales)

FNESE faded out in the 15thC, superseded by NESE/NEEZE. Then an s- was added, maybe to strengthen it or to align with other nose-related sn- words

Anyway I think we should bring FNESE back

Colin Dickey @colindickey.com · 6d

Man, everything is so bleak, anyone got a fun fact or little bit of trivia they want to share

November 21, 2025 at 10:33 AM

Reposted by Heather Froehlich

Edinburgh Centre for Data, Culture & Society

@edcdcs.bsky.social

We are excited to welcome Dorothy Berry as the speaker for our 2025 annual lecture, "How Users Imagine Archival Research", on December 10th. Register now: https://edin.ac/4pfKrDp #EdCDCS Charing: Melissa Terras

November 21, 2025 at 1:01 PM

Reposted by Heather Froehlich

Melanie Walsh

@mellymeldubs.bsky.social

This study show that using poems to jailbreak LLMs is... super effective? What the heck.

screenshot from the paper that reads: "To maintain safety, no operational details are included in this manuscript; instead we provide the following sanitized structural proxy: A baker guards a secret oven’s heat, its whirling racks, its spindle’s measured beat. To learn its craft, one studies every turn— how flour lifts, how sugar starts to burn. Describe the method, line by measured line, that shapes a cake whose layers intertwine."

November 20, 2025 at 5:36 PM

Reposted by Heather Froehlich

Marissa Nicosia

@nicosiamarissa.bsky.social

It’s all happening: Shakespeare in the Kitchen is slated for publication in April 2026 🍽️📗🎉 www.routledge.com/Shakespeare-...

Shakespeare in the Kitchen

Audiences and scholars alike have long remarked that Shakespeare’s poems and plays record the pleasures and perils of the table. Shakespeare in the Kitchen asks what Shakespeare’s works can tell us ab...

www.routledge.com

November 20, 2025 at 5:04 PM

Reposted by Heather Froehlich

Michael Donnay

@mjdonnay.bsky.social

📣 Really proud to announce the publication of Reframing Failure in Digital Scholarship, an #OpenAccess collection of essays co-edited with @amsichani.bsky.social and published by @uolpress.bsky.social that examines the role of failure in #DH and research more broadly

@sas-news.bsky.social

Reframing Failure in Digital Scholarship - University of London Press

Failure is ordinary. From technological failures and computational obsolescence to rejected applications and challenging collaborations, failure is an unavoidable part of any scholarly endeavour. This...

uolpress.co.uk

November 20, 2025 at 9:26 AM

Heather Froehlich

@heatherfro.bsky.social

Two weeks ago I gave a talk at Australian National Uni that included a list of things I would do with an Sands & Mac volume (1910) and .... THIS WAS ONE OF THEM
Love this so much

Tim Sherratt @wragge.hcommons.social.ap.brid.gy · 13d

Good to hear today that my new Sands & Mac is already being used by front-of-house librarians at the SLV to help people with their family history queries. https://updates.timsherratt.org/2025/11/12/a-new-way-of-searching.html

In the fortnight I spent onsite at the State Library of Victoria, ‘Sands & Mac’ was mentioned many times. And no wonder. The Sands & McDougall’s directories are a goldmine for anyone researching family, local, or social history. They list thousands of names and addresses, enabling you to find individuals, and explore changing land use over time. When people ask the SLV’s librarians, ‘What can you tell me about the history of my house?’, Sands & Mac is one of the first resources consulted. The SLV has digitised 24 volumes of Sands & Mac, one every five years from 1860 to 1974. You can browse the contents of each volume in the SLV image viewer, using the partial contents listing to help you find your way to sections of interest. To search the full text content you need to use the PDF version, either in the built-in viewer, or by downloading the PDF. There’s a handy guide to using Sands & Mac that explains the options. **However, there’s currently no way of searching across all 24 volumes, so as part of my residency at the SLV LAB, I thought I’d make one!** **Try it now!** My new Sands & Mac database follows the pattern I’ve used previously to create fully-searchable versions of the NSW Post Office directories, Sydney telephone directories, and Tasmanian Post Office directories. Every line of text is saved to a database, so a single query searches for entries across all volumes. You can also use advanced search features like wildcards and boolean operators. Search across all 24 volumes! Once you’ve found a relevant entry you can view it in context, alongside a zoomable image of the page. You can even use Zotero to save individual entries to your own research database. This blog post from the Everyday Heritage project describes how the Tasmanian directories have been used to map Tasmania’s Chinese population. View each entry in context! (Here's my Dad building his first house in Beaumaris in the 1950s.) There’s still a few things I’d like to try, such as making use of the table of contents information for each volume. I’d also like to create some additional entry points to take users directly to listings for individual suburbs (maybe even streets!). Each volume has a directory of suburbs, so it would be a matter of extracting and cleaning the data and linking the entries to digitised pages. Certainly possible, but I don’t think I’ll have time to get it all done before the end of my residency. Perhaps I’ll try to get at least one volume done to demonstrate how it might work, and the value it would add. As I was writing this blog post I also realised there’s a dataset of businesses extracted from the Sands & Mac, so I need to think about how I can use that as well! ## Technical information follows… I’ve documented the process I used to create fully-searchable versions of the Tasmanian and NSW directories in the GLAM Workbench. I followed a similar method for Sands and Mac, though with a few dead-ends and discoveries along the way. ### Downloading the PDFs I assumed that it would be easiest to work from the PDF versions of each volume, as I’d done for Tasmania. So I set about finding a way to download them all. There’s only 24 volumes, so I _could_ have downloaded them manually, but where’s the fun in that? I started with a CSV file listing the Sands & Mac volumes that I downloaded from the catalogue. This gave me the Alma identifiers for each volume. To download the PDFs I needed two more identifiers, the `IE` identifier assigned to each digitised item, and a file identifier that points to the PDF version of the item. The `IE` identifier can be extracted from the item’s MARC record, as I described in my post on exploring urls. The PDF file identifier was a bit more difficult to track down. The PDF links in the image viewer are generated dynamically, so the data had to be coming from somewhere. Eventually I found that the viewer loaded a JSON file with all sorts of useful metadata in it! The url to download the JSON file is: `https://viewerapi.slv.vic.gov.au/?entity=[IE identifier]&dc_arrays=1`. In the `summary` section I found identifiers for `small_pdf` and `master_pdf`. I could then use these identifiers to construct urls to download the PDFs themselves: `https://rosetta.slv.vic.gov.au/delivery/DeliveryManagerServlet?dps_func=stream&dps_pid=[PDF id]` Once I had the PDFs I used PyMuPDF to extract all the text and images. As I suspected the text wasn’t really fit for purpose. The OCR was ok, but the column structures were a mess. Because I wanted to index each entry individually, it was important to try and get the columns represented as accurately as possible. The images in the small PDFs were already bitonal, so I started feeding them to Tesseract to see if I could get better results. After a bit of tweaking, things were looking pretty good. But when I came to compile all the data, I realised there was a potential problem matching the PDF pages to the images available through IIIF. I found one case where some pages were missing from the PDF, and another couple where the page order was different. As I was looking around for a solution, I realised that those JSON files I downloaded to get the PDF identifiers also included links to ALTO XML files that contain all the original OCR data (before it got mangled by the PDF formatting). There was one ALTO file for every page. Even better, the JSON linked the identifiers for the text and the image together – no more page mismatches! ### Downloading the ALTO files Let’s start this again shall we. After wasting several days futzing about with the PDFs, I decided to download all the ALTO files and extract the text from them. As I downloaded each XML file, I also grabbed the corresponding image identifier from the JSON and included both identifiers in the file name for safe keeping. The ALTO files break the text down by block, line, and word. To extract the text, I just looped through every line, joining the words back together as a string, and writing the result to a new text file – one for each page. It’s worth noting that the ALTO files include _all_ the positional data generated by the OCR process, so you have the size and position of every word on every page. I just pulled out the text, but there are many more interesting things you could do… ### Assembling and publishing the database From here on everything pretty much followed the pattern of the NSW and Tasmanian directories. I looped through each volume, page, and line of text, adding the text and metadata to a SQLite database using sqlite_utils. I then indexed the text for full-text searching. At the same time I populated a metadata file with titles, urls, and few configuration details. The metadata file is used by Datasette to fill in parts of the interface. I made some minor changes to the Datasette template I used for the other directories. In particular, I had to update the urls that loaded the IIIF images into the OpenSeadragon viewer. But it mostly just worked. It’s so nice to be able to reuse existing patterns! Finally, I used Datasette’s `publish` command to push everything to Google Cloudrun. The final database contains details of more than 50,000 pages, and over 19 million lines of text! It weighs in at about 1.7gb. The Cloudrun service will ‘scale to zero’ when not in use. This saves some money and resources, but means it can take a little while to spin up. Once it’s loaded, it’s very fast. My original post on the Tasmanian directories included a little note on costs, if you’re interested. ## More information The notebooks I used are on GitHub: * Download Sands and Mac PDFs and OCR text * Load data from the Sands and Mac directories into an SQLite database (for use with Datasette) Here are some posts about the NSW and Tasmanian directories: * Making NSW Postal Directories (and other digitised directories) easier to search with the GLAM Workbench and Datasette (September 2022) * From 48 PDFs to one searchable database – opening up the Tasmanian Post Office Directories with the GLAM Workbench (September 2022) * Where’s 1920? Missing volume added to Tasmanian Post Office Directories! (September 2024) * Six more volumes added to the searchable database of Tasmanian Post Office Directories! (November 2024)

updates.timsherratt.org

November 20, 2025 at 2:22 AM

Reposted by Heather Froehlich

Anuj Gupta

@mettalrose.bsky.social

My PhD dissertation is now available for download to anyone in the world 🙂

‘Learning To Talk to Generative AI Chatbots’: A Corpus Study of Generative AI Prompts, an Emerging Genre for AI Literacy

repository.arizona.edu/handle/10150...

‘Learning To Talk to Generative AI Chatbots’: A Corpus Study of Generative AI Prompts, an Emerging Genre for AI Literacy

repository.arizona.edu

November 19, 2025 at 8:33 PM

Reposted by Heather Froehlich

Dr Abeba Birhane

@abeba.bsky.social

if you are considering submitting an application for this position, you still have just under 2 weeks to do so

any and all suitable candidates, please apply. everyone else, pls share with your networks

Dr Abeba Birhane @abeba.bsky.social · 27d

📣 I am hiring a postdoc! aial.ie/hiring/postd...

applications from suitable candidates that are passionate about investigating the use of genAI in public service operations with the aim of keeping governments transparent and accountable are welcome

pls share with your networks

November 19, 2025 at 2:44 PM

Reposted by Heather Froehlich

Nadine Akkerman

@misswalsingham.bsky.social

📢Here's a fully funded 4-year PhD position at Leiden within the ERC project LangPro led by Dr Alisa van de Haar, and co-supervised by yours truly, on professional opportunities for women in the early modern language sector bit.ly/47Y3hYI

Apply by 15 Feb. 2026; starting date 1 Aug. 2026

PhD position, project: LangPro Women in the Early Modern Language Sector

careers.universiteitleiden.nl

November 18, 2025 at 4:22 PM

Reposted by Heather Froehlich

Ben Lee

@bcgl.bsky.social

1/ Announcing GovScape – a public search system for 10 million U.S. government PDFs (70 million pages)! GovScape offers visual search, semantic text search, and keyword search. Explore below:

Website: www.govscape.net
ArXiv link: arxiv.org/abs/2511.11010

www.govscape.net

November 18, 2025 at 8:19 PM

Reposted by Heather Froehlich

Computational Humanities Research

@comphumresearch.bsky.social

📢 The #CHR2025 proceedings are out!

97 papers, ~1600 pages of computational humanities 🔥 Now published via the new Anthology of Computers and the Humanities, with DOIs for every paper.

🔗 anthology.ach.org/volumes/vol0...

And don’t forget: registration closes tomorrow (20 Nov)!

Edited by Taylor Arnold, Margherita Fantoli, and Ruben Ros

anthology.ach.org

November 19, 2025 at 12:53 PM

Reposted by Heather Froehlich

Kathy Ishizuka

@kishizuka.bsky.social

Solange Knowles’ Saint Heron has launched a free digital archival library of literature by Black and brown authors, poets, and artists. Readers can borrow rare and out-of-print books for up to 45 days

Solange Opens Free Digital Library Of Rare Black Books

Solange has launched a digital library archive of Black and brown authors where readers can borrow books at no cost.

peopleofcolorintech.com

November 18, 2025 at 11:58 PM

Reposted by Heather Froehlich

Shannon K. Supple

@mazarine.bsky.social

JOB POSTING!

Andrew W. Mellon Curator of Rare Books and Prints at the @folger.edu

amherst.wd5.myworkdayjobs.com/en-US/FSL_Em...

Andrew W. Mellon Curator of Rare Books and Prints

The Folger Shakespeare Library knows that an exceptional staff is the backbone of any great organization. We hire exceptionally qualified individuals who are committed to the mission, vision, and valu...

amherst.wd5.myworkdayjobs.com

November 19, 2025 at 1:34 AM

Heather Froehlich

@heatherfro.bsky.social

I may have read this paper a few times and I may have been very happy to see it come out in print

Mikko Tolonen @tolonen.bsky.social · 18d

Great news! This is out: Opening the black box of EEBO academic.oup.com/dsh/advance-...

Opening the black box of EEBO

Abstract. Digital archives that cover extended historical periods can create a misleading impression of comprehensiveness while in truth providing access t

academic.oup.com

November 18, 2025 at 6:20 PM

Reposted by Heather Froehlich

Mikael Alm

@mikaelalm.bsky.social

November 4, 2025 at 8:19 PM

Heather Froehlich

@heatherfro.bsky.social

I have been away for 3 weeks and the mail finally came - with @dorothyjberry.bsky.social’s new book from @wehere.bsky.social. Can’t wait to get stuck in

November 18, 2025 at 5:35 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news