Sebastian Galkin
functionth.bsky.social
Sebastian Galkin
@functionth.bsky.social
22 followers 34 following 6 posts
Posts Media Videos Starter Packs
So happy with this milestone. Lots of work went into this one!
Today at SciPy 2025 we released Icechunk 1.0, an open source package and specification that enables database-style transactions against petabyte-scale array datasets using only cloud object storage as infrastructure. Read about it on our blog earthmover.io/blog/icechun..., or visit earthmover.io
Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here - Earthmover
A year ago, we made an important internal decision which set Earthmover on a new course—we decided to refactor and open source our core technology for storing array-based data in the cloud. This took ...
earthmover.io
Reposted by Sebastian Galkin
𝐻𝑜𝑤 𝑑𝑜𝑒𝑠 𝐼𝑐𝑒𝑐ℎ𝑢𝑛𝑘 𝑎𝑣𝑜𝑖𝑑 𝑟𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡 𝑠𝑡𝑜𝑟𝑎𝑔𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑑𝑎𝑡𝑎 𝑣𝑒𝑟𝑠𝑖𝑜𝑛𝑠?

Icechunk stores only new or changed chunks for each version —no redundant copies or rewrites. You get instant time travel, branching, and efficient updates, all with negligible storage overhead.

More: bit.ly/3F1XFST
Icechunk: Efficient storage of versioned array data - Earthmover
We recently got an interesting question in Icechunk’s community Slack channel (thank you Iury Simoes-Sousa for motivating this post): I’m new to Icechunk. How is the storage managed for redundant info...
earthmover.io
Reposted by Sebastian Galkin
Our latest blog post dives into the chaos of the status quo - where every tweak means regenerating the 𝑤ℎ𝑜𝑙𝑒 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 and collaboration and experimentation is often stifled by silos and secret knowledge. Check out the full post: earthmover.io/blog/tensoro...
TensorOps: Scientific Data Doesn't Have to Hurt - Earthmover
Curious how your team scores on the "Data Pain Survey"? Wondering why your teams are building Rube Goldberg machines just to put some data on a map? Or just want to see our plan to bring order to your...
earthmover.io
After months of Rust, I wrote some Python this weekend. I immediately got burned by global mutable state
Last week @deepakcherian.bsky.social gave a fascinating talk at NCAR on data sharing and open-data. The historic perspective, the achievements and failures past and present, how to learn and move forward to fulfill the promises. Remarkable and illuminating www.youtube.com/watch?v=JZT3...
CISL Seminar: Deepak Cherian (Earthmover)
YouTube video by NCAR Computational and Information Systems Laboratory (CISL)
www.youtube.com
Had the idea of using Icechunk (an multi-dimensional array database) for something I would never use Icechunk for
1/ 🚨 New Blog Post Alert: "𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝐴𝑏𝑜𝑢𝑡 𝐼𝑐𝑒𝑐ℎ𝑢𝑛𝑘 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 𝑤𝑖𝑡ℎ 𝑎 𝐶𝑙𝑖𝑐ℎ𝑒́𝑑 𝑏𝑢𝑡 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑣𝑒 𝐸𝑥𝑎𝑚𝑝𝑙𝑒" 🏦🔁

👉 Read it here: earthmover.io/blog/learnin...
Learning about Icechunk consistency with a clichéd but instructive example - Earthmover
In this post we’ll show what can happen when more than one process write to the same Icechunk repository concurrently, and how Icechunk uses transactions and conflict resolution to guarantee consisten...
earthmover.io
Reposted by Sebastian Galkin
You could also do this for arbitrarily large scientific array datasets using Xarray + Icechunk + R2/Tigris

juhache.substack.com/p/0-data-dis...
0$ Data Distribution
Ju Data Engineering Weekly - Ep 78
juhache.substack.com
230k reads/sec or much more. The S3ky is the limit!
Reposted by Sebastian Galkin
📣 Blog post alert! 𝐄𝐱𝐩𝐥𝐨𝐫𝐢𝐧𝐠 𝐈𝐜𝐞𝐜𝐡𝐮𝐧𝐤 𝐬𝐜𝐚𝐥𝐚𝐛𝐢𝐥𝐢𝐭𝐲: 𝐮𝐧𝐭𝐚𝐧𝐠𝐥𝐢𝐧𝐠 𝐒𝟑'𝐬 𝐩𝐫𝐞𝐟𝐢𝐱 𝐬𝐭𝐨𝐫𝐲. This technical post by @functionth.bsky.social dives deep into the internals of how S3 shards data, showing that distributed Icechunk can easily perform 230,000 object reads/sec and beyond. earthmover.io/blog/explori...
Exploring Icechunk scalability: untangling S3's prefix story | Earthmover
We show Icechunk can scale to extremely high concurrency levels, and explain how it achieves this in modern object stores.
earthmover.io
Reposted by Sebastian Galkin
We often see folks try to convince tabular data tools to perform well with multi-dimensional array data. This post by @rabernat.bsky.social explains, from first principles, why this rarely works. Its a good one! 👇👇👇
⭐ We just released the first post in our Fundamentals series. This one is called 𝐓𝐞𝐧𝐬𝐨𝐫𝐬 𝐯𝐬. 𝐓𝐚𝐛𝐥𝐞𝐬 - 𝐖𝐡𝐲 𝐭𝐚𝐛𝐮𝐥𝐚𝐫 𝐭𝐨𝐨𝐥𝐬 𝐭𝐫𝐢𝐩 𝐨𝐯𝐞𝐫 𝐠𝐫𝐢𝐝𝐝𝐞𝐝 𝐝𝐚𝐭𝐚. earthmover.io/blog/tensors...
Fundamentals: Tensors vs. Tables | Earthmover
Why tabular tools trip over gridded data.
earthmover.io
I've worked on Icechunk almost exclusively for the last six months. I'm very proud of the result; you should check it out.
1/ 🚀 Solving #NASA ’s cloud data dilemma: Icechunk unlocks 100x faster access to archival data formats

We're thrilled to publish results from our pilot project with NASA and @developmentseed.org to enable high-performance cloud-native access for NASA’s 100s of petabytes of Earth observation data.
Reposted by Sebastian Galkin
1/ Check out our latest blog post earthmover.io/blog/xarray-... to learn about the dramatic improvement and performance of Xarray’s Zarr backend. We achieved improved the “time to first byte” metric, building on Zarr-Python’s new asyncio internals.
Accelerating Xarray with Zarr-Python 3 | Earthmover
We have recently dramatically improved the performance of Xarray’s Zarr backend. This post explores how we’ve improved the “time to first byte” metric, building on Zarr-Python’s new asyncio internals.
earthmover.io