Jim Shaw
jimshaw.bsky.social
Jim Shaw
@jimshaw.bsky.social
Postdoc at Dana-Farber and Harvard Med with Heng Li (@lh3lh3.bsky.social). Prev: UBC / UofT.

I like thinking about computational biological sequence analysis and its applications to metagenomics.

https://jim-shaw-bluenote.github.io
Reposted by Jim Shaw
What is the best strategy to win any contest?

Eliminate your opponents of course.

Recently, my friend @fernpizza.bsky.social showed how plasmids compete intracellularly (check out his paper published in Science today!). With @baym.lol, we now know they can fight.

www.biorxiv.org/content/10.1...
November 20, 2025 at 10:12 PM
Reposted by Jim Shaw
Hot off the press! Our latest paper led by @fernpizza.bsky.social, understanding how plasmids evolve inside cells. These small, self-replicating DNA circles live inside bacteria and carry antibiotic resistance genes, but also compete with one another to replicate. 1/
www.science.org/doi/10.1126/...
Intracellular competition shapes plasmid population dynamics
From populations of multicellular organisms to selfish genetic elements, conflicts between levels of biological organization are central to evolution. Plasmids are extrachromosomal, self-replicating g...
www.science.org
November 20, 2025 at 9:42 PM
Reposted by Jim Shaw
“Bin Chicken” is now published in Nature Methods! It substantially improves genome recovery through rational coassembly 🧬🖥️. Applied to public 🌍 metagenomes, we recovered 24,000 novel species 🦠, including 6 new phyla.
doi.org/10.1038/s415...
@benjwoodcroft.bsky.social @rhysnewell.bsky.social
🧵1/6
November 13, 2025 at 10:09 AM
Reposted by Jim Shaw
Super excited that the bulk of my PhD work is now preprinted! Here we used whole-community competition, or coalescence, experiments to quantify selection acting on genetically diverged strains within larger communities. (1/n)
www.biorxiv.org/content/10.1...
www.biorxiv.org
November 11, 2025 at 5:15 PM
Reposted by Jim Shaw
🚨New preprint out!
We present a foundational genomic resource of human gut microbiome viruses. It delivers high-quality, deeply curated data spanning taxonomy, predicted hosts, structures, and functions, providing a reference for gut virome research. (1/8)
www.biorxiv.org/content/10.1...
November 6, 2025 at 5:26 PM
Reposted by Jim Shaw
Excited to share our LongTrack study out in
@natmicrobiol.nature.com today!

Fecal microbiota transplant (FMT), donor 💩 => patients' gut, is an effective treatment for recurrent C. difficile infection & is being evaluated for Inflammatory Bowel Diseases (IBD) & other conditions 1/

📄 rdcu.be/eL8mR
Long-read metagenomics for strain tracking after faecal microbiota transplant
Nature Microbiology - A long-read metagenomics method empowers faecal microbiota transplantation studies by precisely tracking bacteria from donors to recipients, distinguishing co-existing strains...
rdcu.be
October 22, 2025 at 3:39 PM
Reposted by Jim Shaw
Our @narjournal.bsky.social manuscript is out! It explores the growth of the GTDB (gtdb.ecogenomic.org) since its inception, as well as updates to the website, methodology, policies, and major taxonomic and nomenclatural changes over the past three years.

academic.oup.com/nar/advance-...
GTDB release 10: a complete and systematic taxonomy for 715 230 bacterial and 17 245 archaeal genomes
Abstract. The Genome Taxonomy Database (GTDB; https://gtdb.ecogenomic.org) provides a phylogenetically consistent and rank normalized genome-based taxonomy
academic.oup.com
October 22, 2025 at 2:20 PM
Reposted by Jim Shaw
Our preprint on our new metagenomic HiFi assembler Alice is out 🥳 Based on a *new sketching method* (🧵1/6)
👉 Preprint www.biorxiv.org/content/10.1...
👉 Github github.com/rolandfaure/...
Alice: fast and haplotype-aware assembly of high-fidelity reads based on MSR sketching
We introduce Mapping-friendly Sequence Reduction (MSR) sketches, a sketching method for high-fidelity (HiFi) long reads, and Alice, an assembler that operates directly on these sketches. MSR produces ...
www.biorxiv.org
October 3, 2025 at 2:51 PM
Reposted by Jim Shaw
New pre-print from the Banfield lab, highlighting an interesting case of 1.5Mb megaplasmids found in human gut.

Plasmid genomes were resolved using #PacBio HiFi sequencing with hifiasm-meta for #metagenome assembly. Host association was detected using epigenetic signals.

doi.org/10.1101/2025...
Megaplasmids associate with Escherichia coli and other Enterobacteriaceae
Humans and animals are ubiquitously colonized by Enterobacteriaceae , a bacterial family that contains both commensals and clinically significant pathogens. Here, we report Enterobacteriaceae megaplas...
doi.org
October 1, 2025 at 4:44 PM
Reposted by Jim Shaw
Do you know ~60% of human SVs fall in ~1% of GRCh38? See our new preprint: arxiv.org/abs/2509.23057 and the companion blog post on how we started this project and longdust: lh3.github.io/2025/09/29/o.... Work with Alvin Qin
September 30, 2025 at 2:19 AM
Reposted by Jim Shaw
High-accuracy SNV calling for bacterial isolates using deep learning with AccuSNV https://www.biorxiv.org/content/10.1101/2025.09.26.678787v1
September 29, 2025 at 6:47 PM
Reposted by Jim Shaw
Delighted to see our paper studying the evolution of plasmids over the last 100 years, now out! Years of work by Adrian Cazares, also Nick Thomson @sangerinstitute.bsky.social - this version much improved over the preprint. Final version should be open access, apols.
Thread 1/n
September 25, 2025 at 9:29 PM
Reposted by Jim Shaw
New blog post!

metaMDBG (@gaetanbenoit.bsky.social) and Myloasm (@jimshaw.bsky.social) have had recent releases, so I updated the benchmarks from the Autocycler paper:
rrwick.github.io/2025/09/23/a...

Both tools improved considerably! Time to update your conda environments 😄
Benchmark update: metaMDBG and Myloasm
a blog for miscellaneous bioinformatics stuff
rrwick.github.io
September 23, 2025 at 1:53 AM
Reposted by Jim Shaw
Many of the most complex and useful functions in biology emerge at the scale of whole genomes.

Today, we share our preprint “Generative design of novel bacteriophages with genome language models”, where we validate the first, functional AI-generated genomes 🧵
September 17, 2025 at 3:03 PM
Reposted by Jim Shaw
agtools: a software framework to manipulate assembly graphs https://www.biorxiv.org/content/10.1101/2025.09.14.676178v1
September 16, 2025 at 8:48 PM
Reposted by Jim Shaw
X-Mapper 🦠🧬🧪 - a sequence aligner developed for microbes, now on Bioconda! 🚀
• 11–24× fewer suboptimal alignments (same for human genome)
• 3–579× lower inconsistency
• improves on ~30% of reads aligned to non-target species
github.com/mathjeff/map...
bioconda.github.io/recipes/x-ma...
#microsky
September 15, 2025 at 2:32 AM
Reposted by Jim Shaw
New blog post – A quick look at Roche's SBX
lh3.github.io/2025/09/11/a...
September 12, 2025 at 3:26 AM
Reposted by Jim Shaw
I sincerely appreciate the opportunity to visit @ebi.embl.org (thanks to the @embl.org Sabbatical fellowship). The guidance and support I received from Zam (@zaminiqbal.bsky.social), John (@bacpop.org) and other colleagues have been immensely valuable! You changed my career!❤️
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
September 10, 2025 at 9:56 AM
Reposted by Jim Shaw
Sometimes you meet absolutely incredible bioinfo-magicians.
It was a huge privilege when @shenwei356.bsky.social
joined our group for a year on an @embl.org sabbatical.
While here, he developed a new way of aligning to
millions of bacteria, called LexicMap 1/n
www.nature.com/articles/s41...
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap - Nature Biotechnology
LexicMap uses a fixed set of probes to efficiently query gene sequences for fast and low-memory alignment.
www.nature.com
September 10, 2025 at 9:12 AM
Reposted by Jim Shaw
Now preprinted at arxiv.org/abs/2509.07357
September 10, 2025 at 2:10 AM
Reposted by Jim Shaw
How do you long-read sequence metagenomes? I would argue it starts with the right sample storage & DNA extraction, to enable efficient @nanoporetech.com /@pacbio.bsky.social sequencing, which we investigated in our new paper: www.biorxiv.org/content/10.1...

Massive thanks to Klara for driving this
September 9, 2025 at 3:35 PM
Preprint out for myloasm, our new nanopore / HiFi metagenome assembler!

Nanopore's getting accurate, but

1. Can this lead to better metagenome assemblies?
2. How, algorithmically, to leverage them?

with co-author Max Marin @mgmarin.bsky.social, supervised by Heng Li @lh3lh3.bsky.social

1 / N
High-resolution metagenome assembly for modern long reads with myloasm https://www.biorxiv.org/content/10.1101/2025.09.05.674543v1
September 7, 2025 at 11:35 PM
Reposted by Jim Shaw
Check out Ryan's new blogpost, especially if you work on and polish small eukaryotic genome assemblies - it's always nice when someone adds new features for your tools
New blog post!

I added a new feature to @gbouras13.bsky.social's Pypolca: homopolymer-only polishing. Potentially useful for cross-sample polishing - early test on Cryptosporidium looks promising.

Check it out here:
rrwick.github.io/2025/09/04/h...
Cross-sample homopolymer polishing with Pypolca
a blog for miscellaneous bioinformatics stuff
rrwick.github.io
September 5, 2025 at 6:20 AM
Reposted by Jim Shaw
Now published in GigaScience with minor improvements: academic.oup.com/gigascience/...

* Download: zenodo.org/records/1490...
* More info: github.com/lh3/panmask
Preprint on "Finding easy regions for short-read variant calling from pangenome data": arxiv.org/abs/2507.03718
September 4, 2025 at 4:44 PM
Reposted by Jim Shaw
🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵

Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.

doi.org/10.1101/2024...
September 3, 2025 at 8:39 AM