Nezar Abdennur
@nvictus.bsky.social
100 followers 150 following 24 posts
computational biologist / biological computer / asst prof @UMassChan / phd @MIT / http://abdenlab.org
Posts Media Videos Starter Packs
Pinned
I'm proud to announce the latest release of 🧬 #Oxbow 🏹, with new features to make NGS data analysis more powerful, efficient, and "composable".

Learn more at: oxbow.readthedocs.io
Reposted by Nezar Abdennur
In the genomics community, we have focused pretty heavily on achieving state-of-the-art predictive performance.

While undoubtedly important, how we *use* these models after training is potentially even more important.

tangermeme v1.0.0 is out now. Hope you find it useful!
My talk on #Composability in genomic software at #SciPy2025 is up on YouTube where I showcase both #anywidget and #oxbow.

Thank you to the organizers for the opportunity to present this to both computational biologists and the wider scientific computing community!

www.youtube.com/watch?v=G22_...
Nezar Abdennur - Accelerating Genomic Data Science and AI/ML with Composability | SciPy 2025
YouTube video by SciPy
www.youtube.com
Our #anywidget tutorial from last year's #SciPy conf was uploaded to youtube! Check it out for a hands-on walkthrough to create your own web-based widgets.
We anticipate that joint dimensionality reduction and projection will become a foundational norm for comparative and integrative analysis of long-range interaction profiles in Hi-C/3C+ data. e.g. existing methods for working with classic A/B vectors can be extended to joint higher-order embeddings.
We jointly-hic to create an atlas of 89 human Hi-C samples, uncovering distinct patterns of nuclear architecture associated with heterochromatin composition and demonstrating how higher-order principal components capture missing information about gene expression and regulatory element activity.
jointly-hic accomplishes this using mini-batch incremental PCA, allowing for joint decomposition of arbitrarily many contact matrices at any resolution with constant memory.
Joint decomposition allows for robust and directly comparable low dimensional representations of arbitrarily many contact maps, providing insights into genome organization across diverse biological contexts, from different tissues to developmental stages.
The classic A/B compartment track comes from matrix factorization of a contact matrix into eigenvectors or PCs. Done separately, each map is projected onto a different coordinate system. Comparing such vectors directly is problematic, especially if seeking info from **higher-order** components.
Reposted by Nezar Abdennur
A huge challenge I face when doing ML + genomics analysis is *friction*: the stupid error messages (wrong device!) and dumb implementation issues that snap you out of the zone. I wrote a vignette on how tangermeme has helped me reduce this friction:

tangermeme.readthedocs.io/en/latest/ho...
How To: Reduce Friction and Save Time with Tangermeme — tangermeme v0.1.0 documentation
tangermeme.readthedocs.io
Reposted by Nezar Abdennur
(4) bpnet-lite: Load official Chrom/BPNet models into PyTorch for downstream tangermeme integration. Improved command-line tools + docs. Still concerns about perf of models trained from scratch -- will be resolved next version!

github.com/jmschrei/bpn...

bsky.app/profile/jmsc...
We’re excited and eager for feedback, so please give oxbow a try!

`pip install oxbow`
It also supports:

* Column projection and pushdown (parsing only the fields you need)
* Complex and nested field types (e.g. alignment tags, variant genotype call data, etc.)
* Genomic range-based queries via an index
* User-defined transports and file systems
This update (v0.4.x) provides complete #ApacheArrow data models for 11 file formats and counting, including the GA4GH/htslib formats and UCSC’s BigWig/BigBed.
We revamped the #rustlang backend and implemented a new "DataSource" API in #Python, which allows for streaming conventional #genomic files – in-memory, on-disk, or in the cloud – into the modern data tools you use regularly, including #Pandas, #Polars, #DuckDB, and #Dask.
I'm proud to announce the latest release of 🧬 #Oxbow 🏹, with new features to make NGS data analysis more powerful, efficient, and "composable".

Learn more at: oxbow.readthedocs.io
We’re excited and eager for feedback, so please give oxbow a try!

`pip install oxbow`
It also supports:

* Column projection and pushdown (parsing only the fields you need)
* Complex and nested field types (e.g. alignment tags, variant genotype call data, etc.)
* Genomic range-based queries via an index
* User-defined transports and file systems
This update (v0.4.x) provides complete #ApacheArrow data models for 11 file formats and counting, including the GA4GH/htslib formats and UCSC’s BigWig/BigBed.
We revamped the #rustlang backend and implemented a new "DataSource" API in #Python, which allows for streaming conventional #genomic files – in-memory, on-disk, or in the cloud – into the modern data tools you use regularly, including #Pandas, #Polars, #DuckDB, and #Dask.