alandenadel.bsky.social
@alandenadel.bsky.social
Thank you again to all my collaborators for their contributions and thoughtful feedback.

Madeline Hughes
Akshaya Thoutam
Anay Gupta
Andrew Navia
@nfusi.bsky.social
Srivatsan Raghavan
Peter Winter
@avapamini.bsky.social
@lcrawford.bsky.social

I appreciate any feedback!
November 7, 2025 at 8:07 PM
Interestingly, we saw improved zero-shot performance when increasing model size (but still no data scaling) for both scVI and Geneformer
November 7, 2025 at 8:07 PM
The Nicheformer authors observed a similar phenomenon, that when Nicheformer was pre-trained on 1% of their 110M cell dataset performance did not decrease dramatically:
November 7, 2025 at 8:07 PM
There is an implicit assumption that scaling the pre-training dataset size is inherently better, but the only demonstrated scaling law we know of is in terms of data quality:
arxiv.org/abs/2503.02726
Measurement noise scaling laws for cellular representation learning
Deep learning scaling laws predict how performance improves with increased model and dataset size. Here we identify measurement noise in data as another performance scaling axis, governed by a distinc...
arxiv.org
November 7, 2025 at 8:07 PM
This work addresses a critical consideration in training large-scale models: the size and diversity of the pre-training corpus.
November 7, 2025 at 8:07 PM
And for out-of-distribution perturbation response prediction.
November 7, 2025 at 8:07 PM
We also observed similar results for zero-shot batch integration.
November 7, 2025 at 8:07 PM
The learning saturation points were always 25% or less when evaluating the models on zero-shot classification and were always 10% or less when evaluating the models on fine-tuned classification.
November 7, 2025 at 8:07 PM
To assess the extent to which this plateauing generalized across datasets and tasks, we identified the "learning saturation point" for each model. This is the minimum pre-training dataset size for which a model surpassed 95% of the maximum performance observed.
November 7, 2025 at 8:07 PM
Across all model architectures, model performance at cell type classification (both zero-shot and fine-tuned) plateaued at a small fraction of the total pre-training dataset size, regardless of dataset diversity. When fine-tuning, pre-training has almost no impact on performance.
November 7, 2025 at 8:07 PM
We assessed five model architectures pre-trained to perform as single-cell foundation models (scFMs) in the context of single-cell RNA-seq: PCA, scVI, SSL, Geneformer, and SCimilarity. We pre-trained these models on subsets of the scTab corpus using three downsampling schemes.
November 7, 2025 at 8:07 PM
In our expanded analysis, we show that single-cell foundation models tend to plateau in downstream task performance with pre-training subsets that are a small fraction of the size of current pre-training datasets.
November 7, 2025 at 8:07 PM
Thank you to all my collaborators for their contributions and thoughtful feedback.

Madeline Hughes
Akshaya Thoutam
@anaygupta.bsky.social
Andrew Navia
@nfusi.bsky.social
Srivatsan Raghavan
Peter Winter
@avapamini.bsky.social
@lcrawford.bsky.social

I welcome any comments!
December 18, 2024 at 6:48 PM
Our results highlight the need for a more nuanced approach, balancing dataset size and diversity with careful attention to model architectures and model benchmarking.
December 18, 2024 at 6:48 PM
Our findings underscore the importance of prioritizing data quality and content over sheer size. Developers of scFMs and large databases should consider this rather than simply scaling up models and databases, which we have shown is unlikely to meaningfully improve performance.
December 18, 2024 at 6:48 PM
While neural scaling laws observed in other domains suggest that increasing dataset size leads to better performance, our findings show that, past a learning saturation point, simply increasing pre-training datasets doesn't necessarily improve performance on downstream tasks.
December 18, 2024 at 6:48 PM