Lightnews — Scholar-powered news

Kyle Lo

@kylelo.bsky.social

6.6K followers 590 following 510 posts

language model pretraining @ai2.bsky.social, co-lead of data research w/ @soldaini.net, statistics @uw, open science, tabletop, seattle, he/him,🧋 kyleclo.com

Posts Replies Media Videos

Kyle Lo

@kylelo.bsky.social

Yay congrats!!

November 7, 2025 at 5:18 PM

Kyle Lo

@kylelo.bsky.social

correct framing can make or break research contributions

November 6, 2025 at 12:32 AM

Kyle Lo

@kylelo.bsky.social

apply here: job-boards.greenhouse.io/thealleninst...

i answer some FAQs on my site: kyleclo.com/mentorship/

Research Internship, OLMo

Seattle, WA

job-boards.greenhouse.io

November 5, 2025 at 11:11 PM

Kyle Lo

@kylelo.bsky.social

🙌🏻🙌🏻🙌🏻

November 5, 2025 at 1:56 AM

Kyle Lo

@kylelo.bsky.social

thanks for explaining & sorry it's come to this 😮‍💨

curious your thoughts on other measures like restricting such pieces to senior authors who have a publication record in the surveyed area?

November 1, 2025 at 8:17 PM

Kyle Lo

@kylelo.bsky.social

more stories from scholar land 😅

gs tends to have higher cite counts than s2

s2 clusters paper copies & each cluster grants only +1 citation. without proper clusters, each version (eg, preprint vs published) grants citations.

sadly, users can be unhappy when s2 cite counts are lower cuz of this😥

October 28, 2025 at 12:58 AM

Kyle Lo

@kylelo.bsky.social

lol yea

also not widely known but a core difference between gscholar & semantic scholar (s2)

gscholar separates UI & data, so when you merge papers, change is only local to your page but not your coauthors

s2 updates the underlying data, and UI reflects ground truth for all users

October 27, 2025 at 6:26 PM

Kyle Lo

@kylelo.bsky.social

lol also arxiv.org/abs/2402.04607

Google Scholar is manipulatable

Citations are widely considered in scientists' evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and citation ...

arxiv.org

October 26, 2025 at 6:38 AM

Kyle Lo

@kylelo.bsky.social

nice read thx for sharing! I think the piece could use a follow up / complement discussing misaligned incentives that push scientists to compete rather than collaborate (notably, the section on data fragmentation)

October 12, 2025 at 3:03 AM

Kyle Lo

@kylelo.bsky.social

lol so much love for prepost-postpre training

October 9, 2025 at 5:13 PM

Kyle Lo

@kylelo.bsky.social

any other fans of pre-pretraining?

October 9, 2025 at 2:53 PM

Kyle Lo

@kylelo.bsky.social

same flight lol I just got to airport way too early

October 6, 2025 at 12:12 PM

Kyle Lo

@kylelo.bsky.social

ya totes, see u there!

October 2, 2025 at 11:34 PM

Kyle Lo

@kylelo.bsky.social

synthetic data mimics real data's rough shape, modality, types, schema, etc. but with fake values. models these days are quite proficient at operating over data of this type & generating reasonable code; the main contrib here is system design to replace the repetitive data exploratory workflow

October 2, 2025 at 6:01 PM

Kyle Lo

@kylelo.bsky.social

hehe i didnt do anythin!

core is data voyager (arxiv.org/abs/2402.13610) but local LM instead of GPT

it generates code (map-reduce-filter) that transforms data (csvs), a federated platform executes & returns some output back to system. system repeatedly interprets + generates more code

Data-driven Discovery with Large Generative Models

With the accumulation of data at an unprecedented rate, its potential to fuel scientific discovery is growing exponentially. This position paper urges the Machine Learning (ML) community to exploit th...

arxiv.org

October 2, 2025 at 6:01 PM

Kyle Lo

@kylelo.bsky.social

nice post! reminds me of this ICLR2023 paper also had a short discussion about other architectures not seeing as big diff between adam/sgd as seen w modern transformers arxiv.org/abs/2304.139...

Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be

The success of the Adam optimizer on a wide array of architectures has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding o...

arxiv.org

October 2, 2025 at 5:01 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news