Lightnews — Scholar-powered news

Ian Shi

@heyitsmeianshi.bsky.social

And also to our collaborators who provided datasets and thoughtful advice including Simran, Andrew, Cyrus, Defne, Jessica, Kaitlin, Ilyes, @bowang87.bsky.social and @quaidmorris.bsky.social! (+ MSK HPC for making this all possible)

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

A huge thanks to @taykhoomdalal.bsky.social, @phil-fradkin.bsky.social, and Divya for pushing this work across the finish line!

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

mRNABench is available on Github: github.com/morrislab/mR..., where we've made an effort to make the codebase accessible, extensible, and reproducible!

A Colab notebook is available: colab.research.google.com/drive/1VZF5N...

Details on our findings are on BioRxiv: biorxiv.org/content/10.1...

GitHub - morrislab/mRNABench: Collection of mRNA benchmarks

Collection of mRNA benchmarks. Contribute to morrislab/mRNABench development by creating an account on GitHub.

github.com

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

As seen, most models struggle to compositionally generalize, representing a significant gap in their abilities to truly understand regulation.

We hope that this experimental setup and others like it can inform new directions for nucleotide foundation model development.

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

(4) Together with Divya Koyyalagunta, we further assess the ability for foundation models to compositionally generalize from learned motifs.

Models are exposed to either sequence element that promotes translation, but never both, and we task them with predicting the unseen combination.

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

(3) Finally, we assess the limitations of current benchmarking and modelling efforts.

A common source of data leakage is sequence homology, leading to overestimation of performance without careful data splits. We demonstrate the impact of improper splitting in our tasks.

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

@taykhoomdalal.bsky.social further explored this phenomenon and developed a joint CL + MLM objective function, and demonstrated that the joint loss results in superior downstream performance. Remarkably, adding an MLM objective to Orthrus yields SOTA results using 700x less parameters.

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

(2) Choice of pre-training objective has a noticeable impact on downstream performance.

Orthrus, trained using contrastive learning (CL), performs better on "global" sequence-level property prediction compared to finer-resolution tasks, consistent with known CL limitations.

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

In one of the coolest analyses of this paper, @phil-fradkin.bsky.social quantified the distributional differences between mRNA, ncRNA, and genomic regions through their cross-compressibility under a Huffman encoding scheme, reinforcing their distinct regulatory grammars

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

1) Unsurprisingly, we find that models pre-trained on mRNA sequences perform better on downstream mRNA tasks.

While that makes biological sense, the result might be naively counterintuitive -- since mRNAs arise from the genome, shouldn't genomic models be able to model mRNA?

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

On these datasets, we assess the embedding quality of almost all existing nucleotide foundation models including Evo2, RiNALMo, AIDO.RNA, Orthrus, SpliceBERT, and others.

Using linear probing, we conduct over 100K experiments, revealing several insights:

July 15, 2025 at 6:41 PM

Ian Shi

@heyitsmeianshi.bsky.social

In contrast to existing benchmarks, 𝐦𝐑𝐍𝐀𝐁𝐞𝐧𝐜𝐡 focuses on mRNA biology, assessing prediction of:

- mRNA stability
- Mean ribosome loading
- mRNA sub-cellular localization
- RNA-Protein interaction
- Pathogenicity of variants

July 15, 2025 at 6:41 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news