Koushik
koushikn.bsky.social
Koushik
@koushikn.bsky.social
Data Scientist Generative AI @BayerCropScience. ML for Plant Biology. PhD @IowaStateUniversity https://www.linkedin.com/in/koushik-nagasubramanian/
Reposted by Koushik
@jengreitz.bsky.social l & my lab want to co-hire a computational biologist/biostatistician with project management expertise to help map the regulatory code of the human genome and discover genetic mechanisms of disease.

Details below
careersearch.stanford.edu/jobs/computa...

Plz RT
August 19, 2025 at 12:29 AM
Reposted by Koushik
In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known.

Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.
July 25, 2025 at 10:05 PM
Reposted by Koushik
An assessment of DNA language models concludes:
◼️ They do not offer compelling gains over baseline models

Their performance is inconsistent and requires much more compute.

arxiv.org/abs/2412.05430
June 23, 2025 at 8:21 PM
Reposted by Koushik
Our structural core gene pipeline Unicode is now published at GBE
📄 doi.org/10.1093/gbe/...

Please also check out @dongwookkim.bsky.social’s
🧵 bsky.app/profile/dong...
June 3, 2025 at 5:19 PM
Reposted by Koushik
The only reason you love chocolate is because of FUNGUS.

Cacao seeds contain high amounts of polyphenols, making them intensely bitter & unpleasant. There are two natural fungi that do the heavy lifting in turning them into chocolate.

Let's do a quick tour of the process of chocolate making.
May 26, 2025 at 9:19 PM
Reposted by Koushik
Three BioML starter packs now!

Pack 1: go.bsky.app/2VWBcCd
Pack 2: go.bsky.app/Bw84Hmc
Pack 3: go.bsky.app/NAKYUok

DM if you want to be included (or nominate people who should be!)
December 3, 2024 at 3:27 AM
Reposted by Koushik
AFESM: a metagenomic guide through the protein structure universe! We clustered 821M structures (AFDB&ESMatlas) into 5.12M groups; revealing biome-specific groups, only 1 new fold even after AlphaFold2 re-prediction & many novel domain combos. 🧵
🌐 afesm.foldseek.com
📄 www.biorxiv.org/content/10.1...
April 27, 2025 at 12:13 AM
Reposted by Koushik
Super excited to share our review on genomic deep learning models for non-coding variant effect prediction, with Ayesha Bajwa and Nilah Ioannidis. We’d like this review to be a useful resource, and welcome any feedback, comments, or questions! 1/4

arxiv.org/abs/2411.11158
Leveraging genomic deep learning models for non-coding variant effect prediction
The majority of genetic variants identified in genome-wide association studies of complex traits are non-coding, and characterizing their function remains an important challenge in human genetics. Gen...
arxiv.org
November 20, 2024 at 1:31 AM
Reposted by Koushik
Mechanistic interpretability on a protein language model

www.biorxiv.org/content/10.1...
November 18, 2024 at 10:20 PM
Reposted by Koushik
Two BioML starter packs now:

Pack 1: go.bsky.app/2VWBcCd
Pack 2: go.bsky.app/Bw84Hmc

DM if you want to be included (or nominate people who should be!)
I tried to make a bioml starter pack. DM if you want me to add or remove you?

go.bsky.app/2VWBcCd
Anybody have a bioml starter pack?
November 18, 2024 at 5:09 PM
Reposted by Koushik
DEGU distills an ensemble of models into a single model, retaining the ensemble’s predictive performance while providing uncertainty estimates - ie both epistemic (or model) and aleatoric (or data) uncertainty.

Led by @zrcjessica

Paper: www.biorxiv.org/content/10.1...

2/n
Uncertainty-aware genomic deep learning with knowledge distillation
Deep neural networks (DNNs) have advanced predictive modeling for regulatory genomics, but challenges remain in ensuring the reliability of their predictions and understanding the key factors behind t...
www.biorxiv.org
November 16, 2024 at 4:14 PM
Reposted by Koushik
Large protein language models can learn complex epistatic interactions, but how much does that help with predicting variant effects? In this NeurIPS article, we show that classical independent-sites phylogenetic models can outperform pLMs on this task.
1/7
openreview.net/forum?id=H7m...
Ultrafast classical phylogenetic method beats large protein...
Amino acid substitution rate matrices are fundamental to statistical phylogenetics and evolutionary biology. Estimating them typically requires reconstructed trees for massive amounts of aligned...
openreview.net
November 16, 2024 at 8:42 PM
Reposted by Koushik
Thrilled to announce Boltz-1, the first open-source and commercially available model to achieve AlphaFold3-level accuracy on biomolecular structure prediction! An exciting collaboration with Jeremy, Saro, and an amazing team at MIT and Genesis Therapeutics. A thread!
November 17, 2024 at 4:20 PM
Reposted by Koushik
I tried to make a bioml starter pack. DM if you want me to add or remove you?

go.bsky.app/2VWBcCd
Anybody have a bioml starter pack?
November 11, 2024 at 11:45 PM