Interested in microbial genomics, pangenome graphs & evolution 🧬🦠💻
We are curious to check in follow-up works whether these rates and patterns are specific to ST131, or are more general and can be found in other sequence types or even microbial species.
Hope you'll find the work interesting! Let us know if you have any observations or comments!
We are curious to check in follow-up works whether these rates and patterns are specific to ST131, or are more general and can be found in other sequence types or even microbial species.
Hope you'll find the work interesting! Let us know if you have any observations or comments!
To explain the total structural diversity of the dataset, more than 2000 distinct structural variations must have happened in its short evolutionary history, corresponding to an average rate of one every 3 mutations on the core-genome. This is a remarkably high rate!
To explain the total structural diversity of the dataset, more than 2000 distinct structural variations must have happened in its short evolutionary history, corresponding to an average rate of one every 3 mutations on the core-genome. This is a remarkably high rate!
Most of the IS integrations disrupt genes, and such structural gains would be interpreted as loss events in gene-based analyses. However, this happens less than expected by chance, indicating that roughly 2/3 of these integrations have already been removed by purifying selection.
Most of the IS integrations disrupt genes, and such structural gains would be interpreted as loss events in gene-based analyses. However, this happens less than expected by chance, indicating that roughly 2/3 of these integrations have already been removed by purifying selection.
In binary junctions the vast majority of events are gains, often corresponding to an insertion sequence (IS) or prophage integrating in an otherwise conserved region of the genome. This corresponds to a rough rate of one of these events every 20 mutations on the core-genome.
In binary junctions the vast majority of events are gains, often corresponding to an insertion sequence (IS) or prophage integrating in an otherwise conserved region of the genome. This corresponds to a rough rate of one of these events every 20 mutations on the core-genome.
For binary junctions we can go even further: they can be associated with gain or loss events.
In particular singleton junctions correspond to events on terminal branches of the tree, while non-singleton junctions can in principle be associated also to events on internal branches.
For binary junctions we can go even further: they can be associated with gain or loss events.
In particular singleton junctions correspond to events on terminal branches of the tree, while non-singleton junctions can in principle be associated also to events on internal branches.
By looking at the content of the junctions, we find that the two peaks in binary junctions are explained by the movement of insertion sequences and prophages respectively, while hotspots are very flexible regions, rich in mobile genetic elements and defense systems.
By looking at the content of the junctions, we find that the two peaks in binary junctions are explained by the movement of insertion sequences and prophages respectively, while hotspots are very flexible regions, rich in mobile genetic elements and defense systems.
On the other end of the spectrum we find hotspots, regions with tens to hundreds of different distinct paths, and a total accessory genome content of tens to hundreds of kbp in length.
On the other end of the spectrum we find hotspots, regions with tens to hundreds of different distinct paths, and a total accessory genome content of tens to hundreds of kbp in length.
If we scatter-plot these two quantities for all of the 519 junctions in the dataset, we find that the majority are binary, i.e. they contain only two possible distinct paths, of which one is often empty. Their length distribution is bimodal, with a peak around 1 kbp and another around 30 kbp.
If we scatter-plot these two quantities for all of the 519 junctions in the dataset, we find that the majority are binary, i.e. they contain only two possible distinct paths, of which one is often empty. Their length distribution is bimodal, with a peak around 1 kbp and another around 30 kbp.
We look at the local graph between two adjacent core blocks, that we call a junction graph. In this graph the diversity can be quantified in terms of number of distinct paths and total accessory sequence content.
We look at the local graph between two adjacent core blocks, that we call a junction graph. In this graph the diversity can be quantified in terms of number of distinct paths and total accessory sequence content.
Next we investigate the structural diversity in the accessory genome. The fact that the order of core-genes is mostly conserved provides a well-defined frame of reference in which to study accessory variation.
Next we investigate the structural diversity in the accessory genome. The fact that the order of core-genes is mostly conserved provides a well-defined frame of reference in which to study accessory variation.
However, the fact that synteny is largely conserved across big evolutionary distances, and the fact that many of these changes happen on terminal branches of the tree, indicate that these changes are likely removed by purifying selection.
However, the fact that synteny is largely conserved across big evolutionary distances, and the fact that many of these changes happen on terminal branches of the tree, indicate that these changes are likely removed by purifying selection.