Ethan Weinberger
ethanweinberger.bsky.social
Ethan Weinberger
@ethanweinberger.bsky.social
Ph.D student in Computer Science and Engineering at the University of Washington working with Su-In Lee.
Reposted by Ethan Weinberger
I'm happy to share that our gReLU package is now published in Nature Methods!

www.nature.com/articles/s41...
gReLU: a comprehensive framework for DNA sequence modeling and design - Nature Methods
gReLU advances deep-learning-based modeling and analysis of DNA sequences with comprehensive toolsets and versatile applications.
www.nature.com
October 15, 2025 at 9:21 PM
Reposted by Ethan Weinberger
scverse turns 3!
What started as a shared vision for interoperable single-cell analysis has become a vibrant, global community.
From AnnData to full multimodal pipelines, we’re building the future of everything single-cell and spatial omics, together.
Here’s to what’s next!
May 17, 2025 at 10:08 PM
Reposted by Ethan Weinberger
📣 Mark your calendars! The 2025 edition of the scverse conference will take place on 17-19 November at Stanford University (US) scverse.org/conference20...

Call for abstracts and registrations coming soon!
scverse conference 2025
Follow us on our channels to learn more details in the coming weeks
scverse.org
May 12, 2025 at 10:47 PM
Reposted by Ethan Weinberger
Our preprint on designing and editing cis-regulatory elements using Ledidi is out! Ledidi turns *any* ML model (or set of models) into a designer of edits to DNA sequences that induce desired characteristics.

Preprint: www.biorxiv.org/content/10.1...
GitHub: github.com/jmschrei/led...
Programmatic design and editing of cis-regulatory elements
The development of modern genome editing tools has enabled researchers to make such edits with high precision but has left unsolved the problem of designing these edits. As a solution, we propose Ledi...
www.biorxiv.org
April 24, 2025 at 12:59 PM
Reposted by Ethan Weinberger
genomebiology.biomedcentral.com/articles/10....

Quite an indictment of some of the current single cell "virtual cell" foundation models. Even for the relatively mundane applications, cell labeling, batch correction etc, they are poor compared to much simpler & cheaper methods.
Zero-shot evaluation reveals limitations of single-cell foundation models - Genome Biology
Foundation models such as scGPT and Geneformer have not been rigorously evaluated in a setting where they are used without any further training (i.e., zero-shot). Understanding the performance of mode...
genomebiology.biomedcentral.com
April 20, 2025 at 4:14 PM
Reposted by Ethan Weinberger
First-ever CODE ML workshop at ICML!
July 18 or 19, 2025, Vancouver, Canada

Submit papers on OSS libraries, maintenance, best practices & more.
Format: 4-page non-archival papers
Due: May 19

codeml-workshop.github.io/codeml2025/#...
CODEML Workshop @ ICML 2025
Championing Open-source Development in Machine Learning
codeml-workshop.github.io
April 17, 2025 at 5:42 PM
Reposted by Ethan Weinberger
Most people haven’t heard of this test, which is available in the US. It accurately predicts Alzheimer’s (not just if there’s a risk, but when). It is modulated by exercise and likely other lifestyle factors.
Here’s (almost) everything we know about it
erictopol.substack.com/p/the-breakt...
April 14, 2025 at 1:48 PM
Reposted by Ethan Weinberger
Some encouraging news for cross-gene generalization of allele effects in S2F models. www.biorxiv.org/content/10.1...
Deep genomic models of allele-specific measurements
Allele-specific quantification of sequencing data, such as gene expression, allows for a causal investigation of how DNA sequence variations influence cis gene regulation. Current methods for analyzin...
www.biorxiv.org
April 16, 2025 at 1:46 AM
Reposted by Ethan Weinberger
New preprint out!
This is probably my most important paper. To my deep chagrin, it has no math.
XIST is a non-coding RNA exclusive to XX females. It silences one of the X chromosomes.
So what is it doing in male heart Schwann cells?
Male XIST expression in cardiac pseudo-glia does not induce X chromosome inactivation https://www.biorxiv.org/content/10.1101/2025.04.09.648005v1
April 15, 2025 at 7:40 PM
Reposted by Ethan Weinberger
As an academic who works on tech to discover causes and cures of disease, contributing to novel drugs reaching patients has been thrilling.
Thanks to @statnews.com naming me to STATUS List 2025 honoring leaders in health, medicine, and science!

#STATUSList
www.statnews.com/status-list/...
April 10, 2025 at 11:10 AM
Reposted by Ethan Weinberger
Our new preprint is out! We optimized our open-source platform, HyDrop (v2), for scATAC sequencing and generated new atlases for the mouse cortex and Drosophila embryo with 607k cells. Now, we can train sequence-to-function models on data generated with HyDrop v2!
www.biorxiv.org/content/10.1...
April 4, 2025 at 8:52 AM
Reposted by Ethan Weinberger
The cover of Nature Biomedical Engineering features work from #UWAllen’s @suinlee.bsky.social on techniques for auditing #AI dermatology image classifiers—one of two projects from the lab highlighted in this issue, alongside a deep learning model for cancer insights. www.nature.com/natbiomedeng...
Nature Biomedical Engineering - Auditing medical machine learning
This issue highlights advances in applications of machine learning for diagnosing disease and for sorting and classifying health data, and includes a...
www.nature.com
April 1, 2025 at 9:50 PM
Reposted by Ethan Weinberger
Human Body Single-Cell Atlas of 3D Genome Organization and DNA Methylation https://www.biorxiv.org/content/10.1101/2025.03.23.644697v1
March 24, 2025 at 1:34 PM
Reposted by Ethan Weinberger
Our new pre-print, investigating a few important questions when we train S2F models on different types of MPRA datasets. Congrats to Yilun and @xinmingtu.bsky.social www.biorxiv.org/content/10.1...
Investigating Data Size, Sequence Diversity, and Model Complexity in MPRA-based Sequence-to-Function Prediction
We created the MPRA Dataset Collection (MDC), a curated resource of MPRA data from 12 studies comprising over 150 million labeled DNA subsequences. These datasets include both random and natural genom...
www.biorxiv.org
March 15, 2025 at 3:02 AM
Reposted by Ethan Weinberger
Wow. "NIH" canceled my co-mentored (with Dave Sulzer) PhD student's F31 funding. His work is on understanding the genetics and neuroscience of language learning disorders. F31 provides no indirect $ to Columbia, just pays his salary. Not that it should matter, but he's an American citizen. W.T.F.
March 11, 2025 at 12:41 PM
Reposted by Ethan Weinberger
Congratulations to #UWAllen professor @suinlee.bsky.social on her election as a Fellow of the International Society for Computational Biology! @iscb.bsky.social honored Lee for her pioneering work on explainable #AI for biology and medicine. www.iscb.org/iscb-news-it... #PopulationHealth #ThisIsUW
March 12, 2025 at 5:53 PM
Reposted by Ethan Weinberger
Awesome summary of the field. An important point is to separate the design method from the oracle model being used. Sometimes, people say they're proposing a new design method but mean a cool new oracle model.

Modelling and design of transcriptional enhancers

www.nature.com/articles/s44...
Modelling and design of transcriptional enhancers - Nature Reviews Bioengineering
Enhancers are genomic elements critical for regulating gene expression. In this Review, the authors discuss how sequence-to-function models can be used to unravel the rules underlying enhancer activit...
www.nature.com
March 3, 2025 at 6:58 PM
Reposted by Ethan Weinberger
Workshop on Advances in Post-Bayesian methods (May 15--16, UCL): postbayes.github.io/workshop2025/
Advances in post-Bayesian methods – workshop2025
postbayes.github.io
February 26, 2025 at 5:29 PM
Reposted by Ethan Weinberger
Our new paper describing a scalable approach for training sequence-to-function models on personal genomes ("personal genome training"), includes our observations on when this works and its limitations. www.biorxiv.org/content/10.1...
Congrats: Anna, @xinmingtu.bsky.social , @lxsasse.bsky.social
A scalable approach to investigating sequence-to-expression prediction from personal genomes
A key promise of sequence-to-function (S2F) models is their ability to evaluate arbitrary sequence inputs, providing a robust framework for understanding genotype-phenotype relationships. However, despite strong performance across genomic loci , S2F models struggle with inter-individual variation. Training a model to make genotype-dependent predictions at a single locus-an approach we call personal genome training-offers a potential solution. We introduce SAGE-net, a scalable framework and software package for training and evaluating S2F models using personal genomes. Leveraging its scalability, we conduct extensive experiments on model and training hyperparameters, demonstrating that training on personal genomes improves predictions for held-out individuals. However, the model achieves this by identifying predictive variants rather than learning a cis-regulatory grammar that generalizes across loci. This failure to generalize persists across a range of hyperparameter settings. These findings highlight the need for further exploration to unlock the full potential of S2F models in decoding the regulatory grammar of personal genomes. Scalable software and infrastructure development will be critical to this progress. ### Competing Interest Statement The authors have declared no competing interest.
www.biorxiv.org
February 23, 2025 at 11:31 PM
Reposted by Ethan Weinberger
My heart goes out to all of the people at the NIH and CDC who were fired recently. These people weren't fired for being bad at their job or a waste of resources -- they were fired because they were easy to fire by outsiders trying to meet a quota. They worked years/decades.. for this?
February 16, 2025 at 1:31 PM
Reposted by Ethan Weinberger
Given that science funding is under attack, it might be as good a time as any to reflect on how we spend our precious dollars. Cutting out expenditure publishing papers in overpriced journals might be a good thing to seriously consider once again.
February 11, 2025 at 6:06 PM
Reposted by Ethan Weinberger
MLCB is an excellent conference and a great opportunity to meet other people in the field. Highly recommend attending!
[SAVE THE DATE] MLCB 2025 is happening Sept 10-11 at the NY Genome Center in NYC!

Attend the premier conference at the intersection of ML & Bio, share your research and make lasting connections!

Submission deadline: June 1
More details: mlcb.github.io

Help spread the word—please RT! #MLCB2025
February 5, 2025 at 5:05 PM
Reposted by Ethan Weinberger
[SAVE THE DATE] MLCB 2025 is happening Sept 10-11 at the NY Genome Center in NYC!

Attend the premier conference at the intersection of ML & Bio, share your research and make lasting connections!

Submission deadline: June 1
More details: mlcb.github.io

Help spread the word—please RT! #MLCB2025
February 5, 2025 at 2:50 AM
Reposted by Ethan Weinberger
Mean of the training data still absolutely crushing it for perturbation prediction.
www.biorxiv.org/content/10.1...
January 24, 2025 at 12:59 PM