Pete Bachant
banner
petebachant.me
Pete Bachant
@petebachant.me
RSE @caltech.edu

Bicycles, fluid dynamics, Python, open source, open science, reproducibility. https://petebachant.me | https://calkit.org
Reposted by Pete Bachant
Thanks @insidehighered.com for publishing our OpEd w Annie K. Lamar #publicvoices of The Oped Project @ucsantabarbara.bsky.social !

🤖Open source software & infra accelerates scientific discoveries by removing financial & technical barriers.

It's about time we start treating it as a public good👇
December 12, 2025 at 5:52 PM
For those who think we can solve the reproducibility crisis with policy and education alone, consider the pedestrian and cyclist injury/fatality rate from drivers in the US versus somewhere like the Netherlands. Infrastructure matters.

#openscience #reproducibility
December 12, 2025 at 11:16 PM
Overleaf to VS Code in the browser in under 2 minutes: youtu.be/GjyMxwYbdXk

#phdstudents #academia #openscience
Overleaf to GitHub Codespaces (VS Code in the browser) in under 2 minutes
YouTube video by Calkit
youtu.be
December 8, 2025 at 4:23 PM
What uv has done for Python project management is what I hope Calkit can do for research/analytics projects.

#openscience #datascience #reproducibility
December 6, 2025 at 6:22 PM
Small improvement to the declarative Conda environment handling in Calkit: Now you can put editable install paths in the pip section. No more "create, mutate, export" cycles--just use the environment for something like `calkit nb execute`, and it just works. #python

github.com/calkit/calki...
Release v0.32.5 · calkit/calkit
What's Changed Add ability to put -e . in pip section of Conda env spec by @petebachant in #628 Full Changelog: v0.32.4...v0.32.5
github.com
December 4, 2025 at 9:32 PM
Jupyter notebooks are fine to use in production as long as their outputs aren't considered "official" unless they are run in batch mode, e.g., with nbconvert or papermill. That is, you should never deliver any result from a notebook you ran interactively.

#reproducibility #openscience
December 2, 2025 at 7:26 PM
Tired of manually uploading figures to Overleaf? Over the past few weeks I tried to make it as easy as possible to sync an Overleaf project with a Calkit project so updating the data or analysis can automatically be reflected in a LaTeX document: youtu.be/BuzLFO0JYxU

#openscience #reproducibility
How to connect your analysis with your writing in Overleaf using Calkit
YouTube video by Calkit
youtu.be
November 24, 2025 at 3:10 PM
Separation of concerns is great but we also need minimization of concerns. In other words, "the main thing is to keep the main thing the main thing."

#softwareengineering #swe #developer
November 12, 2025 at 11:28 PM
The Calkit Run GitHub Action now authenticates with OIDC tokens, so no secrets are required to push artifacts, e.g., the latest PDF of your paper, up to the cloud: github.com/calkit/run-a...

#automation #openscience #reproducibility
Release v2.0.0 · calkit/run-action
With this version we automatically fetch a DVC token from calkit.io using GitHub OIDC. Full Changelog: v1...v2.0.0
github.com
November 8, 2025 at 3:18 PM
💥 First "real world" Calkit repro pack just dropped!

In this paper we did a bunch of benchmarking for a brand new astronomical alert brokering system designed to interface with the Rubin Observatory.

Check out the repo here: github.com/boom-astro/b...

#openscience #reproducibility #opensource
GitHub - boom-astro/boom-paper: The first paper about BOOM development.
The first paper about BOOM development. Contribute to boom-astro/boom-paper development by creating an account on GitHub.
github.com
November 7, 2025 at 5:09 PM
Reposted by Pete Bachant
Really love this kind of reality-check meta-research 👉“The struggle to make transparency mainstream: initial evidence for a slow uptake of open science practices in PhD theses”

royalsocietypublishing.org/doi/full/10....
November 5, 2025 at 3:05 AM
If you publish a "repro pack" with your paper, you're awesome, but there's about a 10% chance it will actually run on someone else's computer. In this post I explain why that isn't your fault, why it matters, and what we should do about it: petebachant.me/single-button

#openscience #reproducibility
Single-button reproducibility: The what, the why, and the how
petebachant.me
October 17, 2025 at 2:05 PM
Calkit projects can now incorporate Julia Jupyter notebooks into their pipelines: calkit.io/calkit/examp...

#julialang #reproducibility #openscience
Calkit
calkit.io
October 17, 2025 at 2:46 AM
Why number your notebooks/scripts and execute them manually when you could simply put them into a pipeline that automatically manages their environments and caches their outputs?

docs.calkit.org/pipeline/

#datascience #automation
The pipeline - Calkit
docs.calkit.org
October 14, 2025 at 3:01 PM
1. Generate evidence to support some claims
2. Don't automate the creation of said evidence

Congratulations, you've just contributed to the reproducibility crisis!

#reproducibility #openscience
October 8, 2025 at 8:29 PM
Reposted by Pete Bachant
So much brilliant work never makes it into a paper.
The code, the data, the long nights helping others debug.
At pyOpenSci, we believe that code, data, and community are the pulse.
Research advances quickly when we build together & openly.
Join us. 💛 bit.ly/pyos-volunteer
#openscience #opensource
Get involved with pyOpenSci
pyOpenSci’s Website
bit.ly
October 8, 2025 at 5:20 PM
Don't be ashamed of "messy" code. If it works, it's good. Share it.

#openscience #reproducibility
October 5, 2025 at 3:18 PM
Reading through some slides from 2013 titled "how to succeed in reproducible research without really trying". It's true we have all the tools needed for researchers to build their own reproducible workflows, but still many do not. Maybe the tools are still too hard to learn and use!
October 3, 2025 at 11:47 AM
Programming tip: Name classes after the data they encapsulate, not the actions they perform on that data. For example, instead of SchemaProcessor, just call it Schema:

processed_schema = Schema().process()

#programming #oop #softwareengineering
September 28, 2025 at 9:01 AM
Hot take: Notebooks are fine in production as long as they're part of a reproducible pipeline

docs.calkit.org/notebooks/

#reproducibility #datascience #openscience
Notebooks - Calkit
docs.calkit.org
September 26, 2025 at 10:01 AM
Please don't number your scripts. Refer back to (2) and use a pipeline (like Calkit's of course)!

www.nature.com/articles/d41...

#reproducibility #automation #openscience
It’s a new term: here are 99 lab hacks
Nature asked contributors, editors and working researchers to share their best advice for scientists.
www.nature.com
September 26, 2025 at 8:37 AM
Reposted by Pete Bachant
In a newly released arXiv preprint, we explore how open science practice like sharing data, code and preprints relate to citation impact in French-authored research over a 3-year period.

Thanks to @ouvrirlascience.bsky.social for highlighting its national importance.

🔗 Read more: plos.io/3Vmykrj
September 16, 2025 at 4:53 PM
Reproducibility tip: Any figure, dataset, ML model, etc., should not be shared until it is produced with an automated, version-controlled pipeline.

#reproducibility #openscience
September 16, 2025 at 2:28 PM
While profiling some CUDA code on a SLURM cluster I realized I was not working in a very reproducible way, which could become a problem down the road if I ever needed to know how a certain result was generated, so Calkit now has SLURM integration: docs.calkit.org/pipeline/slu...
SLURM integration - Calkit
docs.calkit.org
September 15, 2025 at 2:55 PM