Kyle Lo
@kylelo.bsky.social
language model pretraining @ai2.bsky.social, co-lead of data research w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,🧋 kyleclo.com
correct framing can make or break research contributions
November 6, 2025 at 12:32 AM
correct framing can make or break research contributions
apply here: job-boards.greenhouse.io/thealleninst...
i answer some FAQs on my site: kyleclo.com/mentorship/
i answer some FAQs on my site: kyleclo.com/mentorship/
Research Internship, OLMo
Seattle, WA
job-boards.greenhouse.io
November 5, 2025 at 11:11 PM
apply here: job-boards.greenhouse.io/thealleninst...
i answer some FAQs on my site: kyleclo.com/mentorship/
i answer some FAQs on my site: kyleclo.com/mentorship/
thanks for explaining & sorry it's come to this 😮💨
curious your thoughts on other measures like restricting such pieces to senior authors who have a publication record in the surveyed area?
curious your thoughts on other measures like restricting such pieces to senior authors who have a publication record in the surveyed area?
November 1, 2025 at 8:17 PM
thanks for explaining & sorry it's come to this 😮💨
curious your thoughts on other measures like restricting such pieces to senior authors who have a publication record in the surveyed area?
curious your thoughts on other measures like restricting such pieces to senior authors who have a publication record in the surveyed area?
more stories from scholar land 😅
gs tends to have higher cite counts than s2
s2 clusters paper copies & each cluster grants only +1 citation. without proper clusters, each version (eg, preprint vs published) grants citations.
sadly, users can be unhappy when s2 cite counts are lower cuz of this😥
gs tends to have higher cite counts than s2
s2 clusters paper copies & each cluster grants only +1 citation. without proper clusters, each version (eg, preprint vs published) grants citations.
sadly, users can be unhappy when s2 cite counts are lower cuz of this😥
October 28, 2025 at 12:58 AM
more stories from scholar land 😅
gs tends to have higher cite counts than s2
s2 clusters paper copies & each cluster grants only +1 citation. without proper clusters, each version (eg, preprint vs published) grants citations.
sadly, users can be unhappy when s2 cite counts are lower cuz of this😥
gs tends to have higher cite counts than s2
s2 clusters paper copies & each cluster grants only +1 citation. without proper clusters, each version (eg, preprint vs published) grants citations.
sadly, users can be unhappy when s2 cite counts are lower cuz of this😥
lol yea
also not widely known but a core difference between gscholar & semantic scholar (s2)
gscholar separates UI & data, so when you merge papers, change is only local to your page but not your coauthors
s2 updates the underlying data, and UI reflects ground truth for all users
also not widely known but a core difference between gscholar & semantic scholar (s2)
gscholar separates UI & data, so when you merge papers, change is only local to your page but not your coauthors
s2 updates the underlying data, and UI reflects ground truth for all users
October 27, 2025 at 6:26 PM
lol yea
also not widely known but a core difference between gscholar & semantic scholar (s2)
gscholar separates UI & data, so when you merge papers, change is only local to your page but not your coauthors
s2 updates the underlying data, and UI reflects ground truth for all users
also not widely known but a core difference between gscholar & semantic scholar (s2)
gscholar separates UI & data, so when you merge papers, change is only local to your page but not your coauthors
s2 updates the underlying data, and UI reflects ground truth for all users
lol also arxiv.org/abs/2402.04607
Google Scholar is manipulatable
Citations are widely considered in scientists' evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and citation ...
arxiv.org
October 26, 2025 at 6:38 AM
lol also arxiv.org/abs/2402.04607
nice read thx for sharing! I think the piece could use a follow up / complement discussing misaligned incentives that push scientists to compete rather than collaborate (notably, the section on data fragmentation)
October 12, 2025 at 3:03 AM
nice read thx for sharing! I think the piece could use a follow up / complement discussing misaligned incentives that push scientists to compete rather than collaborate (notably, the section on data fragmentation)
lol so much love for prepost-postpre training
October 9, 2025 at 5:13 PM
lol so much love for prepost-postpre training
any other fans of pre-pretraining?
October 9, 2025 at 2:53 PM
any other fans of pre-pretraining?
same flight lol I just got to airport way too early
October 6, 2025 at 12:12 PM
same flight lol I just got to airport way too early
ya totes, see u there!
October 2, 2025 at 11:34 PM
ya totes, see u there!
synthetic data mimics real data's rough shape, modality, types, schema, etc. but with fake values. models these days are quite proficient at operating over data of this type & generating reasonable code; the main contrib here is system design to replace the repetitive data exploratory workflow
October 2, 2025 at 6:01 PM
synthetic data mimics real data's rough shape, modality, types, schema, etc. but with fake values. models these days are quite proficient at operating over data of this type & generating reasonable code; the main contrib here is system design to replace the repetitive data exploratory workflow
hehe i didnt do anythin!
core is data voyager (arxiv.org/abs/2402.13610) but local LM instead of GPT
it generates code (map-reduce-filter) that transforms data (csvs), a federated platform executes & returns some output back to system. system repeatedly interprets + generates more code
core is data voyager (arxiv.org/abs/2402.13610) but local LM instead of GPT
it generates code (map-reduce-filter) that transforms data (csvs), a federated platform executes & returns some output back to system. system repeatedly interprets + generates more code
Data-driven Discovery with Large Generative Models
With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit th...
arxiv.org
October 2, 2025 at 6:01 PM
hehe i didnt do anythin!
core is data voyager (arxiv.org/abs/2402.13610) but local LM instead of GPT
it generates code (map-reduce-filter) that transforms data (csvs), a federated platform executes & returns some output back to system. system repeatedly interprets + generates more code
core is data voyager (arxiv.org/abs/2402.13610) but local LM instead of GPT
it generates code (map-reduce-filter) that transforms data (csvs), a federated platform executes & returns some output back to system. system repeatedly interprets + generates more code
nice post! reminds me of this ICLR2023 paper also had a short discussion about other architectures not seeing as big diff between adam/sgd as seen w modern transformers arxiv.org/abs/2304.139...
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
The success of the Adam optimizer on a wide array of architectures has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding o...
arxiv.org
October 2, 2025 at 5:01 PM
nice post! reminds me of this ICLR2023 paper also had a short discussion about other architectures not seeing as big diff between adam/sgd as seen w modern transformers arxiv.org/abs/2304.139...