Aditi Merchant
adititm.bsky.social
Aditi Merchant
@adititm.bsky.social
BioE PhD student @ Stanford in the Hie Lab // ML for Synthetic Biology
Together, this work suggests that genomic sequence models can meaningfully generalize beyond characterized natural evolution. Looking forward, we hope that semantic design can serve as a starting point for function-guided design and optimization of genes across biology.
November 19, 2025 at 4:37 PM
Beyond providing novel sequences for functions of interest, SynGenome can be used to predict the roles of domains of unknown function, reveal functional associations across prokaryotic biology, and catalog chimeric proteins with unique domain combinations generated by Evo.
November 19, 2025 at 4:37 PM
Next, we designed anti-CRISPR (Acr) proteins. Evo generated functional Acr proteins that protected against spCas9, despite some having no sequence or predicted structural similarity to known Acrs. This further supported the idea Evo could generalize based on context alone.
November 19, 2025 at 4:35 PM
We next asked if semantic design could co-design more evolutionarily diverse sequences. Focusing on toxin–antitoxin systems, we successfully generated a functional RNA antitoxin, a de novo toxic gene, and broadly neutralizing antitoxins. Many had <30% sequence identity to nature.
November 19, 2025 at 4:35 PM
We first tested if Evo understands genomic context. Given partial sequences of conserved genes, we show that Evo can achieve near-perfect amino acid sequence recovery and complete entire operons bidirectionally, all while still producing diverse underlying DNA sequences.
November 19, 2025 at 4:34 PM
Genomic language models like Evo can leverage this: by prompting with natural genomic context containing genes related to a function of interest, we can ‘autocomplete’ sequences with novel, diverse generations enriched for similar functions. We call this semantic design.
November 19, 2025 at 4:34 PM
Just as word meaning emerges from context—"you shall know a word by the company it keeps"—prokaryotic gene function is tied to genomic context. This notion of guilt by association, where related genes cluster in operons, has led to the discovery of many molecular tools like CRISPR, BGCs, and more.
November 19, 2025 at 4:33 PM
In recent years, we’ve seen immense progress in leveraging generative AI to accelerate biological design. However, using these models to produce diverse sequences with desired high-level functions remains challenging.
November 19, 2025 at 4:32 PM
What if we could autocomplete DNA based on function?

Today in @Nature, we share semantic design—a strategy for function-guided design with genomic language models that leverages genomic context to create de novo genes and systems with desired functions. 🧵

www.nature.com/articles/s41...
November 19, 2025 at 4:31 PM
Finally, to apply semantic mining to generate functional genes from across prokaryotic biology, we developed SynGenome, a database containing over 120 billion base pairs of synthetic DNA sequences. 9/N
December 19, 2024 at 6:54 PM
Despite this high diversity, 17% of the Acr designs we tested were functional. Additionally, many of our experimentally validated Acrs had low confidence AF3 structure predictions and two eluded significant structural or sequence characterization, making them akin to “de novo” genes (!) 8/N
December 19, 2024 at 6:54 PM
We then applied semantic mining to see if we could design new anti-CRISPR (Acr) proteins, a highly diverse group of proteins with limited sequence or structural conservation thought to sometimes emerge via de novo gene birth. 7/N
December 19, 2024 at 6:54 PM
Half of the Evo-designed antitoxins we experimentally tested were functional (!), with most possessing only remote homology to natural proteins and some appearing to neutralize diverse toxin classes. 6/N
December 19, 2024 at 6:54 PM
We then applied semantic mining to generate a multi-gene bacterial toxin-antitoxin (TA) system. Using context from known TA systems as prompts, we first designed and experimentally validated a toxin gene. This toxin gene then served as a prompt for Evo to generate new conjugate antitoxins. 5/N
December 19, 2024 at 6:54 PM
As an initial test, we first demonstrated that Evo 1.5, a new version of Evo with extended pretraining, was able to understand genomic context, showing that it could complete highly conserved genes and operons when prompted with only partial sequences. 4/N
December 19, 2024 at 6:54 PM
Taking inspiration from genome mining techniques using guilt-by-association, we hypothesized that by prompting Evo with a gene encoding a desired function, we could guide the model to generate a new gene with a related function. We term this approach “semantic mining.” 3/N
December 19, 2024 at 6:54 PM
Excited to have the first project of my PhD out!! By leveraging genomic language model Evo’s ability to learn relationships across genes (i.e., "know a gene by the company it keeps"), we show that we can use prompt-engineering to generate highly divergent proteins with retained functionality. 🧵1/N
December 19, 2024 at 6:54 PM