Surag Nair
suragnair.bsky.social
Surag Nair
@suragnair.bsky.social
Machine learning and genetics @Genentech. Previously CS PhD @Stanford.

suragnair.github.io
Reposted by Surag Nair
An artistic reinterpretation of the Nona model's schematics
November 10, 2025 at 11:24 PM
And finally, we have a postdoc opening in our team. You’ll get to work on cutting edge models while gaining a unique perspective on how these tools can shape the future of AI in genomics. Join us! 15/15

careers.gene.com/us/en/job/20...
Postdoctoral Fellow - Regulatory Language Understanding (ReLU) Lab in South San Francisco, California, United States of America | Students & Graduates at Genentech
Apply for Postdoctoral Fellow - Regulatory Language Understanding (ReLU) Lab job with Genentech in South San Francisco, California, United States of America. Students & Graduates at Genentech
careers.gene.com
November 10, 2025 at 9:01 PM
Earlier this year, we presented Nona @MIA_at_Broad (www.youtube.com/watch?v=l14F...). It also received the Best Talk Award at RegSys, ISMB 2025. 14/
MIA: Surag Nair, Nona, A Novel Multimodal Masked Modeling Framework for Functional Genomics
YouTube video by Broad Institute
www.youtube.com
November 10, 2025 at 9:01 PM
We are working on releasing the code and hope to get it out very soon. In the meanwhile, please don't hesitate to reach out if you have any suggestions or questions. 13/
November 10, 2025 at 9:01 PM
This would not have been possible without my amazing colleagues from BRAID #Genentech: Gökcen Ehsan Alex @johahi.bsky.social Nate @avantikalal.bsky.social Tommaso Hector Gabriele

You can find the preprint here: www.biorxiv.org/content/10.1...

12/
Nona: A unifying multimodal masking framework for functional genomics
The non-coding genome encodes complex regulatory logic that orchestrates gene expression and cell identity. While machine learning models for functional genomics have advanced our understanding of the...
www.biorxiv.org
November 10, 2025 at 9:01 PM
Working on Nona has been a great learning experience. Each analysis highlights a different aspect of regulatory biology: from predictive modeling and generation to privacy.

Nona’s flexible masking schemes open new directions, and there’s much more to explore. 11/
November 10, 2025 at 9:01 PM
We trained a small fLM on base-resolution ATAC-seq. It can invert the signal to recover genotype information with high accuracy, even with as few as 5 million reads per sample. This has immediate privacy implications for sharing fragment files. 10/
November 10, 2025 at 9:01 PM
Functional genotyping: scATAC-seq has taken off. Fragment files are the de facto file format. They are treated as privacy-preserving, often shared openly even when raw reads are access-controlled. Using AFGR data, we find that common variants alter base-res ATAC-seq profiles. 9/
November 10, 2025 at 9:01 PM
fLMs are also discrete diffusion models! Nona fLM can generate DNA under functional constraints, e.g. sequences producing weak, strong, left-skewed, or even double-humped DNase-seq profiles.

They allow parallel decoding, with competitive performance at fewer generation steps. 8/
November 10, 2025 at 9:01 PM
Functional language models (fLM): DNA LMs are great at capturing co-evolutionary sequence patterns, but can't connect them to cell-type specific regulation. An fLM conditioned on GM12878 DNase-seq picks up more transcription factor motif features than plain LMs. 7/
November 10, 2025 at 9:01 PM
The context-aware model also improves predictions of promoter expression across diverse integration sites as measured by TRIP-seq experiments. 6/
November 10, 2025 at 9:01 PM
Turns out the biggest gains are at loci showing outlier chromatin states. Here's an example of a heterochromatinized locus where sequence-only model gets the locus wrong, but context-aware model rescues local prediction. 5/
November 10, 2025 at 9:01 PM
Context-aware models: We improve local genomic predictions by providing flanking track measurements (~196 kb) as input. This outperforms sequence-to-function models by up to 13% on the test set. What's driving these improvements? 4/
November 10, 2025 at 9:01 PM
We highlight 3 novel applications. These are diverse, spanning improved local prediction of functional tracks, conditioning DNA language models on functional tracks, and surprisingly, privacy risks in ATAC-seq fragment files. 3/
November 10, 2025 at 9:01 PM
Multimodal masking provides a unified approach.

Nona operates on both DNA sequence and functional genomics tracks. Task-specific masking configurations recover familiar model types, and its flexibility enables entirely new approaches! 2/
November 10, 2025 at 9:01 PM
Thanks for sharing- the link isn’t working for me.
March 19, 2025 at 4:51 PM
Hi Wendy, this internship is for current PhD students only.
December 22, 2024 at 6:44 PM