Lightnews — Scholar-powered news

Calvin McCarter

@calvinmccarter.bsky.social

What makes tabular data unique (and interesting!) is not merely that it's arranged into rows and columns. New blogpost: calvinmccarter.substack.com/p/the-idiosy...

The idiosyncrasies of tabular data

The things that make tabular data different

calvinmccarter.substack.com

August 19, 2025 at 3:06 AM

Calvin McCarter

@calvinmccarter.bsky.social

Has anyone tried far-UVC in their home? It's now dropped into the ~$300 price range where I'm interested in trying it for myself. substack.com/home/post/p-...

Flipping the switch on far-UVC

We’ve known about far-UVC’s promise for a decade. Why isn't it everywhere?

substack.com

March 11, 2025 at 2:43 PM

Calvin McCarter

@calvinmccarter.bsky.social

Here's a link to the report: cleanlabelproject.org/wp-content/u... (TLDR heuristics: whey is better than plant-based, non-organic is better than organic, unflavored is better than chocolate-flavored)

January 12, 2025 at 8:15 PM

Calvin McCarter

@calvinmccarter.bsky.social

Ultra exciting! And it's gratifying to see that this method uses the kernel density integral preprocessing method that I published in @tmlr-pub.bsky.social (2023). (One takeaway: even if your ML research focus isn't deep learning, pursue directions that complement rather than compete with it.)

Samuel Müller @sammuller.bsky.social · Jan 8

This might be the first time after 10 years that boosted trees are not the best default choice when working with data in tables.
Instead a pre-trained neural network is, the new TabPFN, as we just published in Nature 🎉

January 8, 2025 at 7:03 PM

Reposted by Calvin McCarter

Alan Amin

@alannawzadamin.bsky.social

How do you go from a hit in your antibody screen to a suitable drug? Now introducing CloneBO: we optimize antibodies in the lab by teaching a generative model how we optimize them in our bodies!
w/ Nat Gruver, Yilun Kuang, Lily Li, @andrewgwils.bsky.social and the team at Big Hat! 1/7

December 17, 2024 at 4:01 PM

Calvin McCarter

@calvinmccarter.bsky.social

Will neural networks achieve AGI before they figure out how to do tokenization internally? Or, on the way to AGI, will they invent a tokenizer that "just works"? Another way of framing it: is tokenization AGI-complete?

November 27, 2024 at 5:40 AM

Calvin McCarter

@calvinmccarter.bsky.social

JMLR and TMLR provide better reviewing, but worse publicity for accepted papers. Their only real platform, their accounts on X, get very little engagement these days. They should be on 🦋. Better yet, *MLR should have a biweekly or monthly arXiv-style email newsletter. @thegautamkamath.bsky.social

November 24, 2024 at 4:48 PM

Calvin McCarter

@calvinmccarter.bsky.social

Pretty interesting new paper in TMLR: "Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density" openreview.net/forum?id=8Vk...

Controlling the Fidelity and Diversity of Deep Generative Models...

We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with either enhanced fidelity or increased diversity. Our approach involves...

openreview.net

November 22, 2024 at 10:22 PM

Calvin McCarter

@calvinmccarter.bsky.social

Twitter is still a better place to share ML papers than Bluesky:

November 17, 2024 at 9:49 PM

Calvin McCarter

@calvinmccarter.bsky.social

Excited to share a new paper published in TMLR, "Towards Backwards-Compatible Data with Confounded Domain Adaptation", solving a key problem in using AI for biology. How can you combine datasets across settings, when "what you measure" and "how you measure it" are confounded? 1/7

November 17, 2024 at 6:08 PM

Calvin McCarter

@calvinmccarter.bsky.social

Suppose you collect tissue samples with varying degrees of freshness, and want to correct for this. But what if freshness is also correlated with true biological differences (eg healthy vs cancer tissue)? My new paper out in TMLR addresses this problem: openreview.net/forum?id=GSp...

Towards Backwards-Compatible Data with Confounded Domain Adaptation

Most current domain adaptation methods address either covariate shift or label shift, but are not applicable where they occur simultaneously and are confounded with each other. Domain adaptation...

openreview.net

November 15, 2024 at 4:02 AM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news