Jeff Spence
banner
jeffspence.github.io
Jeff Spence
@jeffspence.github.io
assistant professor at ucsf interested in genetics, statistics, etc…

jeffspence.github.io
Basically, when a variant gets lucky and drifts to high frequency, it is more likely to be a hit for every trait that it affects. This makes top hits APPEAR more pleiotropic, when in fact they’re actually LESS pleiotropic on average.

18/n
November 7, 2025 at 12:05 AM
This randomness in variant frequencies also results in what seems a paradox: top GWAS hits are both more trait specific AND more likely to be hits for other traits!

17/n
November 7, 2025 at 12:05 AM
GWAS prioritize individual variants, and sometimes variants just happen to get lucky and drift to high frequencies. This plays a surprisingly large role in prioritization in simulated GWAS.

16/n
November 7, 2025 at 12:05 AM
Specificity is not the end of the story. Burden tests and GWAS also prioritize genes based on things that have absolutely nothing to do with traits.

Burden tests aggregate signal across variants. Long genes have more variants, and so tend to get prioritized higher.

15/n
November 7, 2025 at 12:05 AM
In contrast, HHIP (hedgehog interacting protein) has tons of GWAS hits near it, and basically no burden signal. HHIP makes sense as a height hit, but it’s also been implicated in COPD, pituitary hormone deficiency, etc… It’s NOT height specific!

14/n
November 7, 2025 at 12:05 AM
NPR2 is the second most significant gene in the burden tests, but is in only the 243rd most significant GWAS locus.

Homozygous LoFs in NPR2 cause severe short stature, but don’t affect intelligence, facial features, etc… seems like NPR2 is height specific!

13/n
November 7, 2025 at 12:05 AM
To recap:

Burden tests prioritize trait-SPECIFIC GENES.

GWAS prioritize genes near trait-SPECIFIC VARIANTS.

Looking at height gives a couple of really nice examples: NPR2 and HHIP.

12/n
November 7, 2025 at 12:05 AM
Both of these ways of being specific contribute to variants being ranked highly in GWAS:

1. Coding variants in specifically-expressed genes are more highly ranked

AND

2. Non-coding variants in tissue-specific ATAC peaks are more highly ranked.

11/n
November 7, 2025 at 12:05 AM
What about GWAS?

GWAS prioritize genes near trait-specific VARIANTS, whereas burden tests prioritize trait specific GENES.

Variants can be specific because they act on trait-specific genes, or because they act on pleiotropic genes in a context-specific way.

10/n
November 7, 2025 at 12:05 AM
Using theory developed by @yuvalsim.bsky.social and @gs2747.bsky.social (journals.plos.org/plosbiology/...) we predicted that burden tests rank genes by SPECIFICITY!

This is surprising! Burden tests DO NOT rank genes by IMPORTANCE!

These predictions play out in the UKB.

8/n
November 7, 2025 at 12:05 AM
Instead, one might want to rank genes by their trait SPECIFICITY -- how much do they affect the study trait relative to their effects across all traits. A trait-specific gene might be more "core" to trait biology.

7/n
November 7, 2025 at 12:05 AM
We checked for ourselves, and found:

1. Most burden hits are near a GWAS hit (they converge!)

BUT

2. The ranking of hits is surprisingly discordant. E.g., the second most significant burden hit for height is ranked 243rd in GWAS!

4/n
November 7, 2025 at 12:05 AM
GWAS and burden tests both regress trait values against genetic variation.

In line with this conceptual similarity, previous work (link.springer.com/article/10.1... , www.biorxiv.org/content/10.1...) suggested these tests “converge” on the same genes.

2/n
November 7, 2025 at 12:05 AM
Overall, many of our models did a pretty job of ranking which guides would drive expression the most, but almost all of that performance came across genes.
When trying to predict which guide would have the largest effect on the expression of a particular gene, our results were more mixed.
May 30, 2025 at 2:45 AM
And we considered a lot of different models of what dCas9-p300 actually does to chromatin tracks (How far away does p300 acetylate? How strongly would it increase signal in a ChIP-seq track? Does this interact with nucleosome occupancy? Do nucleosomes interfere with guide binding?)
May 30, 2025 at 2:45 AM
On the experimental side, we can use dCas9 fused to a chromatin modifier to locally alter chromatin structure. We can then read out how those modifications affect gene expression.
This lets us directly test the “causal understanding” of our deep learning models!
May 30, 2025 at 2:45 AM
E.g., for height we also don't find a lot of loci that have significant burden hits but lack a GWAS hit. But we do find the opposite quite a bit.

We were also struck by how discordant the rankings are, and arguably when you have this many significant loci, some kind of ranking is necessary. 5/6
March 28, 2025 at 1:22 AM
😭😭😭
January 8, 2025 at 6:00 PM
Basically, when a variant gets lucky and drifts to high frequency, it is more likely to be a hit for every trait that it affects. This makes top hits APPEAR more pleiotropic, when in fact they’re actually LESS pleiotropic on average. 16/n
December 17, 2024 at 7:05 AM
This randomness in variant frequencies also results in what seems a paradox: top GWAS hits are both more trait specific AND more likely to be hits for other traits! 15/n
December 17, 2024 at 7:05 AM
GWAS prioritize individual variants, and sometimes variants just happen to get lucky and drift to high frequencies. This plays a surprisingly large role in prioritization in simulated GWAS 14/n
December 17, 2024 at 7:05 AM
Specificity is not the end of the story. Burden tests and GWAS also prioritize genes based on things that have absolutely nothing to do with traits.

Burden tests aggregate signal across variants. Long genes have more variants, and so tend to get prioritized higher. 13/n
December 17, 2024 at 7:05 AM
To recap so far: burden tests prioritize trait-specific genes; GWAS prioritize genes near trait-specific variants.

Height GWAS and burden tests give a couple of really nice examples: NPR2 and HHIP. 10/n
December 17, 2024 at 7:05 AM
We find that both of these ways of being specific contribute to variants being ranked highly in GWAS:

1) Coding variants in specifically-expressed genes are more highly ranked
2) Non-coding variants in tissue-specific ATAC peaks are more highly ranked

9/n
December 17, 2024 at 7:05 AM
What about GWAS?
GWAS prioritize genes near trait-specific VARIANTS. This is profoundly different from prioritizing trait-specific GENES. Variants can be specific because they act on trait-specific genes, or because they act on pleiotropic genes in a context-specific way. 8/n
December 17, 2024 at 7:05 AM