Ian Shi
heyitsmeianshi.bsky.social
Ian Shi
@heyitsmeianshi.bsky.social
PhD Student @ University of Toronto

Building foundation models for genomics!
(4) Together with Divya Koyyalagunta, we further assess the ability for foundation models to compositionally generalize from learned motifs.

Models are exposed to either sequence element that promotes translation, but never both, and we task them with predicting the unseen combination.
July 15, 2025 at 6:41 PM
(3) Finally, we assess the limitations of current benchmarking and modelling efforts.

A common source of data leakage is sequence homology, leading to overestimation of performance without careful data splits. We demonstrate the impact of improper splitting in our tasks.
July 15, 2025 at 6:41 PM
(2) Choice of pre-training objective has a noticeable impact on downstream performance.

Orthrus, trained using contrastive learning (CL), performs better on "global" sequence-level property prediction compared to finer-resolution tasks, consistent with known CL limitations.
July 15, 2025 at 6:41 PM
In one of the coolest analyses of this paper, @phil-fradkin.bsky.social quantified the distributional differences between mRNA, ncRNA, and genomic regions through their cross-compressibility under a Huffman encoding scheme, reinforcing their distinct regulatory grammars
July 15, 2025 at 6:41 PM
1) Unsurprisingly, we find that models pre-trained on mRNA sequences perform better on downstream mRNA tasks.

While that makes biological sense, the result might be naively counterintuitive -- since mRNAs arise from the genome, shouldn't genomic models be able to model mRNA?
July 15, 2025 at 6:41 PM
In contrast to existing benchmarks, 𝐦𝐑𝐍𝐀𝐁𝐞𝐧𝐜𝐡 focuses on mRNA biology, assessing prediction of:

- mRNA stability
- Mean ribosome loading
- mRNA sub-cellular localization
- RNA-Protein interaction
- Pathogenicity of variants
July 15, 2025 at 6:41 PM
We're excited to release 𝐦𝐑𝐍𝐀𝐁𝐞𝐧𝐜𝐡, a new benchmark suite for mRNA biology containing 10 diverse datasets with 59 prediction tasks, evaluating 18 foundation model families.

Paper: biorxiv.org/content/10.1...
GitHub: github.com/morrislab/mR...
Blog: blank.bio/post/mrnabench
July 15, 2025 at 6:41 PM