Kyle Lo
banner
kylelo.bsky.social
Kyle Lo
@kylelo.bsky.social
language model pretraining @ai2.bsky.social, co-lead of data research w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,🧋 kyleclo.com
Yay congrats!!
November 7, 2025 at 5:18 PM
correct framing can make or break research contributions
November 6, 2025 at 12:32 AM
apply here: job-boards.greenhouse.io/thealleninst...

i answer some FAQs on my site: kyleclo.com/mentorship/
Research Internship, OLMo
Seattle, WA
job-boards.greenhouse.io
November 5, 2025 at 11:11 PM
🙌🏻🙌🏻🙌🏻
November 5, 2025 at 1:56 AM
thanks for explaining & sorry it's come to this 😮‍💨

curious your thoughts on other measures like restricting such pieces to senior authors who have a publication record in the surveyed area?
November 1, 2025 at 8:17 PM
more stories from scholar land 😅

gs tends to have higher cite counts than s2

s2 clusters paper copies & each cluster grants only +1 citation. without proper clusters, each version (eg, preprint vs published) grants citations.

sadly, users can be unhappy when s2 cite counts are lower cuz of this😥
October 28, 2025 at 12:58 AM
lol yea

also not widely known but a core difference between gscholar & semantic scholar (s2)

gscholar separates UI & data, so when you merge papers, change is only local to your page but not your coauthors

s2 updates the underlying data, and UI reflects ground truth for all users
October 27, 2025 at 6:26 PM
nice read thx for sharing! I think the piece could use a follow up / complement discussing misaligned incentives that push scientists to compete rather than collaborate (notably, the section on data fragmentation)
October 12, 2025 at 3:03 AM
lol so much love for prepost-postpre training
October 9, 2025 at 5:13 PM
any other fans of pre-pretraining?
October 9, 2025 at 2:53 PM
same flight lol I just got to airport way too early
October 6, 2025 at 12:12 PM
ya totes, see u there!
October 2, 2025 at 11:34 PM
synthetic data mimics real data's rough shape, modality, types, schema, etc. but with fake values. models these days are quite proficient at operating over data of this type & generating reasonable code; the main contrib here is system design to replace the repetitive data exploratory workflow
October 2, 2025 at 6:01 PM
hehe i didnt do anythin!

core is data voyager (arxiv.org/abs/2402.13610) but local LM instead of GPT

it generates code (map-reduce-filter) that transforms data (csvs), a federated platform executes & returns some output back to system. system repeatedly interprets + generates more code
Data-driven Discovery with Large Generative Models
With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit th...
arxiv.org
October 2, 2025 at 6:01 PM
nice post! reminds me of this ICLR2023 paper also had a short discussion about other architectures not seeing as big diff between adam/sgd as seen w modern transformers arxiv.org/abs/2304.139...
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
The success of the Adam optimizer on a wide array of architectures has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding o...
arxiv.org
October 2, 2025 at 5:01 PM