Kyle Lo
@kylelo.bsky.social
6.5K followers
590 following
500 posts
language model pretraining research @ai2.bsky.social, Co-lead of Data for OLMo w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,🧋 kyleclo.com
Posts
Media
Videos
Starter Packs
Pinned
Kyle Lo
@kylelo.bsky.social
· 10d
Kyle Lo
@kylelo.bsky.social
· 10d
Kyle Lo
@kylelo.bsky.social
· 10d
Kyle Lo
@kylelo.bsky.social
· 10d
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
The success of the Adam optimizer on a wide array of architectures has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding o...
arxiv.org
Kyle Lo
@kylelo.bsky.social
· 24d
Kyle Lo
@kylelo.bsky.social
· 24d
Kyle Lo
@kylelo.bsky.social
· 24d
Kyle Lo
@kylelo.bsky.social
· Sep 4
Kyle Lo
@kylelo.bsky.social
· Sep 4
Kyle Lo
@kylelo.bsky.social
· Sep 4
The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America
Chronicling America is a product of the National Digital Newspaper Program, a partnership between the Library of Congress and the National Endowment for the Humanities to digitize historic newspapers....
arxiv.org
Kyle Lo
@kylelo.bsky.social
· Sep 4
Kyle Lo
@kylelo.bsky.social
· Sep 4
Decomposing Complex Queries for Tip-of-the-tongue Retrieval
When re-finding items, users who forget or are uncertain about identifying details often rely on creative strategies for expressing their information needs -- complex queries that describe content ele...
arxiv.org
Kyle Lo
@kylelo.bsky.social
· Sep 4