Author | Lightnews

Robert Nowak @rdnowak.bsky.social · 2d

Yes. Just write your thoughts in a rough and unpolished form, say rough paragraphs that contain terse points you want to make. then let 'er rip

Robert Nowak @rdnowak.bsky.social · 8d

Section 7 is a wonderful description of the process they went through.

1 1

Robert Nowak @rdnowak.bsky.social · 21d

something just isn't fully clicking. if you look at total yards and time of possession, they should have blown them out. well, better anyway to peak later in season, so let's hope that's what happens (like two seasons ago)

1

Robert Nowak @rdnowak.bsky.social · 21d

Packers get the win, but it wasn't pretty.

1

Robert Nowak @rdnowak.bsky.social · Sep 8

Thanks for participating and presenting your work!

Maria @mariaoros.bsky.social · Sep 8

Honored to have participated in this amazing event and meet great people and their work in the data science field.

2

Robert Nowak @rdnowak.bsky.social · Sep 5

Google promotes box shirts too

1

Robert Nowak @rdnowak.bsky.social · Aug 27

Pour into

2

Reposted by Robert Nowak

Dylan Foster 🐢 @djfoster.bsky.social · Aug 11

Announcing the first workshop on Foundations of Language Model Reasoning (FoRLM) at NeurIPS 2025!

📝Soliciting abstracts that advance foundational understanding of reasoning in language models, from theoretical analyses to rigorous empirical studies.

📆 Deadline: Sept 3, 2025

1 3 11

Robert Nowak @rdnowak.bsky.social · Aug 4

Nice article about my mom’s new book shepherdexpress.com/culture/book...

Birdscaping for Wisconsin and the Great Lakes Region by Mariette Nowak

If you want the birds to flock to your garden, consider “birdscaping,” advises former director of Milwaukee’s Wehr Nature Center.

shepherdexpress.com

3

Robert Nowak @rdnowak.bsky.social · Jul 5

“the only way to predict or to control the functioning of such systems is by an intricate system of charms, spells, and incantations”

Maxim Raginsky @mraginsky.bsky.social · Jul 5

PS realizable.substack.com/p/artificial...

Artificial Intelligence as Sorcery

Repurposing Stanislav Andreski a bit.

realizable.substack.com

2

Robert Nowak @rdnowak.bsky.social · Jun 21

See you there!

Data Science Institute @dsi-uchicago.bsky.social · Jun 20

UChicago is thrilled to host #MMLS2025 in just a few days!
We can’t wait to welcome the ML community to campus.

Huge thanks to our amazing sponsors:
@schmidtsciences.bsky.social
University of Chicago Department of Computer Science
@dsi-uchicago.bsky.social
Invenergy

🧵(1/3)

2

Robert Nowak @rdnowak.bsky.social · May 16

More likely midges. The truest sign of a healthy ecosystem

1 1

Robert Nowak @rdnowak.bsky.social · Apr 25

Looking forward to a great MMLS!

chenhaotan.bsky.social @chenhaotan.bsky.social · Apr 21

The Midwest Machine Learning Symposium will happen in Chicago on June 23-4 on the University of Chicago campus (midwest-ml.org/2025/). We have an amazing lineup of speakers:@profsanjeevarora.bsky.social from Princeton, Heng Ji from UIUC, Tuomas Sandholm from CMU, @ravenben.bsky.social from UChicago.

3

Robert Nowak @rdnowak.bsky.social · Feb 7

This is collaboration with Ziyue Luo, @shroffness and @kevinlauka

Robert Nowak @rdnowak.bsky.social · Feb 7

Jifan’s on the industry job market now, and his expertise in efficient training, distillation, and data curation couldn't be more timely. Feel free to reach out to him at [email protected].
📄 Paper: arxiv.org/abs/2410.02755

GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data

Large language models require vast amounts of high-quality training data, but effective filtering of web-scale datasets remains a significant challenge. This paper demonstrates that GPT-4o is remarkab...

arxiv.org

1 1

Robert Nowak @rdnowak.bsky.social · Feb 7

SIEVE improves upon existing quality filtering methods in the DataComp-LM challenge, producing better LLM pretraining data that led to improved model performance.
This work is part of Jifan's broader research on efficient ML training, from active learning to label-efficient SFT for LLMs.

1

Robert Nowak @rdnowak.bsky.social · Feb 7

Why does this matter? High-quality data is the bedrock of LLM training. SIEVE enables filtering trillions of web data for specific domains like medical/legal text with customizable natural language prompts.

1

Robert Nowak @rdnowak.bsky.social · Feb 7

SIEVE distills GPT-4's data filtering capabilities into lightweight models at <1% of the cost. Not just minor improvements - we're talking 500x more efficient filtering operations.

1

Robert Nowak @rdnowak.bsky.social · Feb 7

🧵 Heard all the buzz around distilling from OpenAI models? Check out @jifanz's latest work SIEVE - showing how strategic distillation can make LLM development radically more cost-effective while matching quality.

1 4

Robert Nowak @rdnowak.bsky.social · Jan 23

Maybe Trump should have read my mom's book: "For the first six weeks, the embryo, whether XX or XY, coasts along in sexual ambiguity." p. 25

2

Robert Nowak @rdnowak.bsky.social · Jan 23

www.rollingstone.com/politics/pol...

Trump's Trans Ban Defines Everyone as Female -- But That's Not the Problem

Donald Trump's executive order attempts to get rid of trans rights. LGBTQ+ advocates intend to fight back.

www.rollingstone.com

3

Reposted by Robert Nowak

Dimitris Papailiopoulos @dimitrisp.bsky.social · Jan 18

Task vectors are akin to punchcards: you feed them to your LLM and it implements specific tasks, without in-context demonstrations. Liu's new paper examines at what scale, where in the network and when during training do they emerge, and how to encourage their emergence.

arxiv.org/pdf/2501.09240

3 23

Robert Nowak @rdnowak.bsky.social · Jan 4

Good luck with that

1

Robert Nowak @rdnowak.bsky.social · Jan 4

p.s. we don't know for sure if I said this or not

1 1

Robert Nowak @rdnowak.bsky.social · Jan 4

Is the solution treating everything electronic as "fake"?
Maybe?

1 1