Joanna Masel
joannamasel.bsky.social
Joanna Masel
@joannamasel.bsky.social
Theoretical biologist and advisor to data scientists at the University of Arizona. Mostly theoretical population genetics and molecular evolution, but I've also published in biochemistry, infectious disease, aging, economics, education. Opinions are my own
Take-home: substitution matrix quality matters. Instead of using ModelFinder, use IQ-Tree’s QMaker to build your own for your taxon of interest, trained on strictly filtered alignments. And use CLOAK before inferring trees. 10/10
December 4, 2025 at 11:51 PM
For the sequence whose gene tree you are inferring, strict filters hurt, and our new gentle filter CLOAK performs best. Propagating uncertainty from our 16 variant alignments into a consensus among 16 variant trees was worse, showing the presence of systematic not just random alignment error. 9/10
December 4, 2025 at 11:51 PM
Using a substitution model trained on strictly filtered alignment data leads to better inference on gene trees, bringing them closer to the known species tree according to Lin-Rajan-Moret distance (an improved extension of Robinson-Foulds distance). 8/10
December 4, 2025 at 11:50 PM
Stricter filters have stronger effects in reducing exchangeabilities associated with less plausible amino acid substitutions, i.e. those that require more than one mutation, according to the genetic code. 7/10
December 4, 2025 at 11:49 PM
Phylogenetics relies on substitution models (rates of evolution between amino acids or nucleotides), which are normally decomposed into a symmetric exchangeability matrix and equilibrium frequencies. Filtering reduces exchangeabilities between amino acids not linked by single mutations 6/10
December 4, 2025 at 11:48 PM
In a trade-off between precision and recall, CLOAK is the best gentle filter, the “partial filtering” option within Divvier doi.org/10.1093/molb... is the best strict filter, and TAPER and the Divvier’s divvying option are in between. GUIDANCE2 and HmmCleaner perform less well. 5/10
December 4, 2025 at 11:47 PM
Our new program CLOAK (CLeaning On Alignment C[K]onsensus) uses 16 variant alignments generated by Muscle5 (3 perturbations of HMM × 3 perturbations of guide tree), and retains only the pairs for which all 16 agree. You can call it within Muscle5 with the option “-cloak”. 4/10
December 4, 2025 at 11:46 PM
Unfortunately, when building a gene tree, every bit of information can be important enough such that filtering alignments (at least with older tools) can make tree inference worse doi.org/10.1093/sysb... 3/10
Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference
Abstract. Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to erro
doi.org
December 4, 2025 at 11:45 PM
Trigger warning: the attached images of real multiple sequence alignments may cause feelings of distress among biologists: 2/10
December 4, 2025 at 11:44 PM
I have no idea about whether he is religious. I was responding to what you wrote about how the Nazis would have classified him, which has a clear factual answer.
November 28, 2025 at 6:59 PM
@sashagusevposts.bsky.social, I would love to see a more detailed answer to this in particular. In your blog post, I couldn't find what I would call overwhelming, clearly laid out evidence re GxG and rares being minor contributions to broad-sense heritability in twin studies.
November 25, 2025 at 8:34 PM