Lightnews — Scholar-powered news

Reposted by Noam Teyssier

Robert Aboukhalil

@robert.bio

Excited to announce a new bqtools tutorial on sandbox.bio by @noamteyssier.bsky.social! Learn about the BINSEQ file format, and how it can replace FASTQ files for better data compression and faster parallel processing: sandbox.bio/tutorials/bq...

Efficient sequence analysis with bqtools

Interactive bqtools tutorial: learn to analyse sequence data efficiently with BINSEQ files using a command-line interface in your browser.

sandbox.bio

November 18, 2025 at 8:35 PM

Noam Teyssier

@noamteyssier.bsky.social

I work with large collections of AnnDatas for single-cell work and got tired of opening notebooks for simple operations. Built a CLI tool to handle some common stuff directly from the terminal.

Quick ops: downsample, concat, pseudobulk, QC, metadata export, etc.

github.com/noamteyssier...

GitHub - noamteyssier/anntools: a cli-driven anndata toolkit

a cli-driven anndata toolkit. Contribute to noamteyssier/anntools development by creating an account on GitHub.

github.com

November 18, 2025 at 8:14 PM

Noam Teyssier

@noamteyssier.bsky.social

BINSEQ is a high-performance format for sequencing data and bqtools is a CLI tool that lets you create and manipulate these files in the style of samtools.

Excited to release a tutorial with @robert.bio showcasing how to use it to encode, decode, and grep sequences in the browser on sandbox.bio!

Efficient sequence analysis with bqtools

Interactive bqtools tutorial: learn to analyse sequence data efficiently with BINSEQ files using a command-line interface in your browser.

sandbox.bio

November 14, 2025 at 6:12 PM

Noam Teyssier

@noamteyssier.bsky.social

New bqtools release with some nice new features!

1. Support for fuzzy matching using sassy
2. Multi-Pattern counting (like `grep -c` but the count is for each individual pattern provided)
3. Pattern files (providing large lists of patterns as either regex or literals)

github.com/ArcInstitute...

Release bqtools-0.4.8 · ArcInstitute/bqtools

What's Changed 116 support fuzzy grep with sassy by @noamteyssier in #118 119 gate fuzzy matching behind feature flag by @noamteyssier in #120 58 implement a pattern count feature by @noamteyssier...

github.com

November 7, 2025 at 1:12 AM

Noam Teyssier

@noamteyssier.bsky.social

I've updated the BINSEQ manuscript to stay up to date with changes since I originally put it out at the beginning of the year

Some notable changes:
1. Support for ambiguous bases with 4bit encoding
2. Support for sequence headers
3. Improved API

www.biorxiv.org/content/10.1...

BINSEQ: A Family of High-Performance Binary Formats for Nucleotide Sequences

Modern genomics produces billions of sequencing records per run, which are typically stored as gzip-compressed FASTQ files. While this format is widely used, it is not optimal for high-throughput proc...

www.biorxiv.org

October 29, 2025 at 8:41 PM

Noam Teyssier

@noamteyssier.bsky.social

Was just about to submit a revision for a paper and realized that I wouldn't be able to submit my source for the text because it was written with typst.

Such a bummer - moving this over to tex now but damn what a waste of time!

typst is just so much nicer to work with.

October 28, 2025 at 11:31 PM

Reposted by Noam Teyssier

Rick Beeloo

@rickbitloo.bsky.social

Around 10% of your Nanopore reads (SQK-RBK114) are incorrectly trimmed. Here is why, and how our new tool Barbell solves it:

www.biorxiv.org/content/10.1...

Want to get started? github.com/rickbeeloo/b...

October 23, 2025 at 8:16 PM

Reposted by Noam Teyssier

Martin Kampmann

@kampmann.bsky.social

Thank you Alzforum for featuring our new preprint identifying regulators of disease states of #microglia.

Project led by Amanda McQuade, computation by Reet Mishra, collaboration with the Nunez and De Jager labs.

Alzforum
www.alzforum.org/news/researc...

Preprint
www.biorxiv.org/content/10.1...

October 22, 2025 at 6:28 PM

Noam Teyssier

@noamteyssier.bsky.social

Had an old tool called `hist` to run `sort | uniq -c` years ago but thought up a high-perf impl for it today. Tried it out and found a 25x throughput over the coreutils version.

Big takeaway - arena allocators, hashmaps, and serde work super well together.

github.com/noamteyssier...

GitHub - noamteyssier/hist-rs: An efficient unique-line counter (25x over `sort | uniq -c`)

An efficient unique-line counter (25x over `sort | uniq -c`) - noamteyssier/hist-rs

github.com

October 22, 2025 at 11:03 PM

Reposted by Noam Teyssier

Ragnar {Groot Koerkamp}

@curiouscoding.nl

Paraseq 0.4 is out now! With double the throughput for processing paired-end input :)

github.com/noamteyssier...

September 4, 2025 at 10:41 PM

Noam Teyssier

@noamteyssier.bsky.social

Added a feature to bqtools yesterday for colored grep output. Also supports colored FASTX output as well. Already useful this morning as I troubleshoot some sequencing outputs!

September 4, 2025 at 5:56 PM

Reposted by Noam Teyssier

Martin Kampmann

@kampmann.bsky.social

Excited that the paper presenting our mouse brain in vivo CRISPR screening platform is out today in @natneuro.nature.com!

Great team effort, led by Biswa Ramani and @ivlrose.bsky.social in the Kampmann lab.

www.nature.com/articles/s41...

CRISPR screening by AAV episome-sequencing (CrAAVe-seq): a scalable cell-type-specific in vivo platform uncovers neuronal essential genes - Nature Neuroscience

The authors developed an adeno-associated virus-based high-throughput in vivo CRISPR screening platform for endogenous mouse brain cell types. Using this platform, they define genes and pathways essen...

www.nature.com

August 22, 2025 at 10:15 PM

Reposted by Noam Teyssier

Antoine Limasset

@npmalfoy.bsky.social

Preprint alert!
We present K2Rmini, an ultra-fast, grep-like tool that extracts sequences of interest from FASTA/FASTQ files based on their k-mer content.
www.biorxiv.org/content/10.1...
A thread

Accelerating k-mer-based sequence filtering

The exponential growth of global sequencing data repositories presents both analytical challenges and opportunities. While k - mer-based indexing has improved scalability over traditional alignment fo...

www.biorxiv.org

July 2, 2025 at 1:00 PM

Noam Teyssier

@noamteyssier.bsky.social

Writing in rust again after a long stretch of python is such a breath of fresh air.

June 26, 2025 at 2:47 AM

Reposted by Noam Teyssier

Arc Institute

@arcinstitute.org

Introducing Arc Institute’s first virtual cell model: STATE

June 23, 2025 at 5:28 PM

Noam Teyssier

@noamteyssier.bsky.social

Pretty cool little utility and blog post - fun to see the business/pleasure index for rust crates

boydkane.com/projects/cra...

Downloaded more for business, or pleasure?

This mini-project was inspired by this tweet: After which I spent about two hours making a small script that grabs data from the rust package repository crates.io, and analyses the ...

boydkane.com

June 18, 2025 at 8:32 PM

Reposted by Noam Teyssier

Heng Li

@lh3lh3.bsky.social

Preprint on "Improving spliced alignment by modeling splice sites with deep learning". It describes minisplice for modeling splice signals. Minimap2 and miniprot now optionally use the predicted scores to improve spliced alignment.
arxiv.org/abs/2506.12986

June 17, 2025 at 1:49 AM

Reposted by Noam Teyssier

Bede Constantinides

@bedec.bsky.social

New preprint! Deacon is a versatile tool for filtering FASTA/FASTQ files and streams at hundreds of megabases per second using minimizers, built with rapid metagenomic host depletion in mind, but equally useful for search.
github.com/bede/deacon

bioRxiv Bioinfo @biorxiv-bioinfo.bsky.social · Jun 13

Deacon: fast sequence filtering and contaminant depletion https://www.biorxiv.org/content/10.1101/2025.06.09.658732v1

June 13, 2025 at 1:25 PM

Reposted by Noam Teyssier

Seth Stadick

@ducktapeprogrammer.bsky.social

ish is a grep-like CLI tool that uses optimal alignment instead of exact matching.

It’s record-type aware, supporting line, FASTA, and FASTQ records.

Built in Mojo as a proof of concept for bioinformatics.

🧵1/5

bioRxiv Bioinfo @biorxiv-bioinfo.bsky.social · Jun 9

Ish: SIMD and GPU Accelerated Local and Semi-Global Alignment as a CLI Filtering Tool https://www.biorxiv.org/content/10.1101/2025.06.04.657890v1

June 9, 2025 at 1:05 PM

Reposted by Noam Teyssier

Rayan Chikhi

@rayanchikhi.bsky.social

Slides from my talk (with @kamilsjaron.bsky.social) on an history of k-mers in bioinformatics: rayan.chikhi.name/pdf/2025-kme...

June 3, 2025 at 9:25 AM

Reposted by Noam Teyssier

Pierre Peterlongo

@pierrepeterlongo.bsky.social

📜 Excited to share insights from our recent paper: "Kaminari: a resource-frugal index for approximate colored k-mer queries". The study aims to efficiently identify documents containing a query string, focusing on DNA strings. www.biorxiv.org/content/10.1... 🧬 🖥️ 1/8

May 27, 2025 at 12:06 PM

Reposted by Noam Teyssier

Daniel Jones

@dcjones.bsky.social

Our Proseg paper is now out in Nature Methods!
www.nature.com/articles/s41...

We borrowed a sampling procedure from the cell simulation literature to infer cell boundaries that best explains the spatial distribution of transcripts.

Cell simulation as cell segmentation - Nature Methods

Proseg is a segmentation approach for single-cell spatially resolved transcriptomics data that uses unsupervised probabilistic modeling of the spatial distribution of transcripts to accurately segment...

www.nature.com

May 22, 2025 at 5:52 PM

Reposted by Noam Teyssier

Internet Archive

@archive.org

📄 The scanners are humming, the film is flowing.

The microfiche livestream is up—digitizing government docs in real time for Democracy’s Library.

Perfect second-screen vibes: Preservation in progress.

🕢 Live M-F, 7:30am–3:30pm PT (except U.S. holidays)
➡️ www.youtube.com/live/aPg2V5R...

lofi Archive radio 🎞️ beats to scan/read microfiche to

YouTube video by Internet Archive

www.youtube.com

May 22, 2025 at 2:37 PM

Reposted by Noam Teyssier

Ragnar {Groot Koerkamp}

@curiouscoding.nl

So yeah, this is why I keep going on about: do we have to sanitize user input or not? File formats where bad inputs are simply not representable are good, because it saves us from this 100x slowdown.

May 16, 2025 at 5:51 PM

Noam Teyssier

@noamteyssier.bsky.social

Just merged in an awesome new feature for xsra to support named pipes with @robp.bsky.social.

This lets you skip an intermediary write step and go straight from SRA to downstream tools.

It works with accessions that are both on- or off-disk.

May 9, 2025 at 3:26 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news