John Marshall
banner
John Marshall
@johnm.bsky.social
Bioinformatics tools developer at Australia’s Centre for Population Genomics. Will edit GA4GH specifications for food. New Zealander fairly recently returned after a decade in the U.K.
It's even wilder than that — there was a parenthetical that I dropped from that posting due to the character limit:

Surprising that it worked reliably (or even AT ALL) on other platforms! 🤷
November 19, 2025 at 8:51 PM
Hmmm… Maybe as an author of the latter paper, what I'm supposed to say is “cite both”… 🤔🤣
November 17, 2025 at 10:21 PM
For the spec meaning of reference, we have a back-burner plan to split BGZF+indexing out into a separate document (partly because it's not SAM-specific but shared by compressed VCF too), but since SAM's introduction BGZF has been described in a section of the SAM specification.
Samtools - Documentation
Samtools
www.htslib.org
November 17, 2025 at 10:19 PM
So IMHO you won't do better than the original paper.

Note that early editions of the SAM spec credited BGZF like this:

“To achieve smaller file size, we always compress a BAM file with the BGZF library, developed by Bob Handsaker.”

Such text was elided when the spec was later rewritten in LaTeX.
November 17, 2025 at 10:19 PM
For the citation meaning of reference, there's a sentence in the original SAM & SAMtools paper:

“BAM is compressed by the BGZF library, a generic library developed by us to achieve fast random access in a zlib-compatible compressed file.”

It's mentioned in our later HTSlib paper, but more briefly.
The Sequence Alignment/Map format and SAMtools - PubMed
http://samtools.sourceforge.net.
pubmed.ncbi.nlm.nih.gov
November 17, 2025 at 10:19 PM
The filename was garbage because the C code used a pointer to a string computed by a Cython expression — which was garbage-collected before the C code ran. Easily fixed by keeping a live object referencing it.

Surprising that it worked reliably on other platforms! Thus i386 was a useful canary.
Ensure stdout filename object is not garbage-collected while in use in C · pysam-developers/pysam@d6fbe29
Store the output of `force_bytes(stdout_f)` in a live object to ensure it is not garbage-collected before it is used within samtools_dispatch(). This fixes a test failure on i386 (SamtoolsTest/Pysa...
github.com
November 17, 2025 at 7:38 AM
I learnt Pascal before C, so putting the (cond) parentheses initially took some getting used to. Pascal had a `then` keyword that provided the needed delineation, and Python has its colon and usually indentation too.

Thus perhaps the in-vogue syntax choices have largely come full circle…
November 15, 2025 at 4:11 AM
C and C++ require the (cond) parentheses; Rust requires the then/else {stmt} braces. (Making both optional makes for bad syntax error recovery and adds ambiguity for some cases.)

🤷 I like leaving off the {} where appropriate… Both approaches have their pros and cons.
November 15, 2025 at 3:49 AM
Or you could use `samtools merge -o - …FILES… | …`.

Samtools follows the usual Unix convention whereby ‘‑’ as a filename refers to standard input or standard output as appropriate.
November 5, 2025 at 1:20 AM
I glanced briefly at the suggestions a day or two ago; e.g., “examples shouldn't use the legacy ‑S option” was music to my ears as I've been harping on about that for years.

…However none of the instances of “‑S” were actually problematic: indeed, one was in the centre of “…user-specified…”! 🙄🤷
October 29, 2025 at 8:04 PM
Apparently it is difficult to understand that implementations are not necessarily obliged to reject every out-of-spec file and produce a useful diagnostic accordingly.

A few specifications do describe in detail what should happen in error situations. IIRC HTML5 does, but it's not common.
October 10, 2025 at 11:58 AM
Thanks a lot Jon, that's crossed the streams in a really enraging way! 🫠

I assumed they meant zoom appointments with a human, but maybe there's a reason it's been rephrased from “Virtual visits” to “Virtual PCP”. Depressing…
October 2, 2025 at 2:32 AM