Possu Huang Lab
@possuhuanglab.bsky.social
470 followers 39 following 31 posts
Our lab uses experimental and computational methods to design de novo proteins | @Stanford
Posts Media Videos Starter Packs
Pinned
1/ In two back-to-back papers, we present our de novo TRACeR platform for targeting MHC-I and MHC-II antigens

TRACeR for MHC-I: go.nature.com/4gcLzn5
TRACeR for MHC-II: go.nature.com/4gj5OQk
Work done by Yilin Chen, @tianyu.bsky.social , Cizhang Zhao and @hkws.bsky.social . Thank you all! (7/8)
SLAE projects all-atom structures onto a smooth manifold! Unguided linear interpolation between conformations in SLAE latent space decodes to coherent intermediates structures. (6/8)
SLAE extends our generative coverage assessment SHAPES to all-atom, per-residue-type granularity. Now we can compare de novo all-atom protein design models and spot residue-level environment biases. (5/8)
Rich in atomic-environment signal, SLAE features outperform PLMs and task-specific models across diverse, challenging downstream tasks, including binding affinity, thermostability and chemical shift prediction. All-atom structure pretraining is all you need! (4/8)
The SLAE latent landscape is organized in meaningful ways beyond amino acid identity. It separates residue embeddings along features including solvent accessibility, secondary structure and structural nativeness. (3/8)
We design a deliberately hard two-part task to learn compact, expressive features: a local graph encoder projects each residue’s atomic interactions into a feature vector, while a global decoder learns to compose these local environment tokens into coherent macromolecules. (2/8)
Introducing SLAE, our new framework to represent all-atom protein structures with residue local chemical environment tokens!
SLAE reasons over atomic interactions to recover structures and residue pairwise energetics, yielding a generalizable, physics-informed latent space. (1/8)
💻 Sampling and training code for Protpardelle-1c is now available: github.com/ProteinDesig...

Feedback and requests are welcome!
Our new set of all-atom models can sample plausible sidechains without stage-2 sampling. Sequence-dependent partial diffusion behavior occurs when we mask the dummy atoms.
We achieve competitive results on MotifBench and the RFdiffusion/La-Proteina motif scaffolding benchmarks with both backbone-only and all-atom models, proposing scaffolds to previously unsolved problems.
We have a new collection of protein structure generative models which we call Protpardelle-1c. It builds on the original Protpardelle and is tailored for conditional generation: motif scaffolding and binder generation.
We include some additional analysis in the supplement, including secondary structure distributions.
SHAPES now published in Cell Systems!
New preprint from our group! We propose SHAPES, a set of metrics to quantify the distributional coverage of generative models of protein structures with embeddings at different structural hierarchies and quantify undersampling / extrapolation behaviors.
Reposted by Possu Huang Lab
Reposted by Possu Huang Lab
Reposted by Possu Huang Lab
A framework for evaluating how well generative models of protein structure match the distribution of natural structures.

@possuhuanglab.bsky.social

www.biorxiv.org/content/10.1...
Generative models capture a biased set of protein structure space Generative models do not capture the full expressivity of PDB structures Protein structure embeddings reveal undersampled and de novo structure space
Our supplement has many additional figures of the rasterized protein structure space, stratified by designable and not designable and spatially organized by ESM3 and ProtDomainSegmentor embeddings.
One consequence of unbiased sampling of protein structure space is a higher likelihood of finding TERtiary Motifs (TERMs) which involve complex loops, with implications for functional protein design (see Figure 5 legend for group labels).
Inspired by the FPD metric in EvoDiff for protein sequence distributions, we compute Fréchet distance using protein structure embeddings, also subsetted to designable and non-designable samples (FPD-D and FPD-ND).
New preprint from our group! We propose SHAPES, a set of metrics to quantify the distributional coverage of generative models of protein structures with embeddings at different structural hierarchies and quantify undersampling / extrapolation behaviors.