Junhao (Bear) Xiong
junhaobearxiong.bsky.social
Junhao (Bear) Xiong
@junhaobearxiong.bsky.social
Machine learning for computational biology. PhD student at Berkeley EECS.
On a personal note, it is at once surreal, gratifying and humbling to be part of a wet-dry colab, I’m so grateful for my collaborators (also great friends) for making it real + keeping it fun! Also thankful our buildings (BAIR and @innovativegenomics.bsky.social) are right next to each other :)
May 31, 2025 at 3:48 PM
Preprint link: arxiv.org/abs/2505.04823
Paper link to “Unlocking Guidance for Discrete State-Space Diffusion and Flow Models”: openreview.net/forum?id=Xsg...
Guide your favorite protein sequence generative model
Generative machine learning models on sequences are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experiment...
arxiv.org
May 31, 2025 at 3:46 PM
This work is only made possible through an incredible interdisciplinary collaboration between the Listgarten lab and @savagecatsonly.bsky.social . All kudos go to the amazing team that I’m super grateful to be part of: @hnisonoff.bsky.social @marialukarska.bsky.social (and Ishan and Luke)
May 31, 2025 at 3:46 PM
The guided library in round 2 showed significantly higher activity than the initial unguided library in the experimental base editing assay.
May 31, 2025 at 3:46 PM
We didn't just validate in silico - we also synthesized & tested proteins in the lab. We used ProteinGuide to engineer an adenine base editor for high activity: generated 2,000 variants → tested in bacteria → used results to guide 2,000 new designs.
May 31, 2025 at 3:46 PM
In our third task, we demonstrate the generality of ProteinGuide beyond amino acid sequences, to structure tokens. In particular, we guide ESM3 to generate backbone structures (as tokens) with specified CATH fold class labels.
May 31, 2025 at 3:46 PM
In our second task, we guided ESM3 to re-design enzymes sequences predicted to belong to specific enzyme classes, based on a published classifier, CLEAN, for enzyme commission number.
May 31, 2025 at 3:46 PM
In our first task, we guided ProteinMPNN with experimental stability measurements from the @grocklin.bsky.social lab to generate amino acid sequences encoding proteins that are more stable than what ProteinMPNN would do on its own.
May 31, 2025 at 3:46 PM
To illustrate the potential of ProteinGuide, we applied it, in silico, to three tasks, using two representative, well-known protein generative models, ProteinMPNN and ESM3. Across these three tasks, we observed that guidance, as expected, led to the desired outcome.
May 31, 2025 at 3:46 PM
We leverage the fact that MLMs (e.g., ESM3), OA-AR models (e.g., ProteinMPNN), and masking-based diffusion models are actually equivalent. This allows us to leverage our previously-developed guidance methodology for discrete diffusion and flow models for MLMs and OA-AR models.
May 31, 2025 at 3:46 PM