Author | Lightnews

Sebastian Galkin @functionth.bsky.social · Jul 10

So happy with this milestone. Lots of work went into this one!

Earthmover @earthmover.io · Jul 10

Today at SciPy 2025 we released Icechunk 1.0, an open source package and specification that enables database-style transactions against petabyte-scale array datasets using only cloud object storage as infrastructure. Read about it on our blog earthmover.io/blog/icechun..., or visit earthmover.io

Icechunk 1.0: Production-Grade Cloud-Native Array Storage Is Here - Earthmover

A year ago, we made an important internal decision which set Earthmover on a new course—we decided to refactor and open source our core technology for storing array-based data in the cloud. This took ...

earthmover.io

1

Reposted by Sebastian Galkin

Earthmover @earthmover.io · May 20

Our latest fundamentals blog post provides an overview of @zarr.dev and its open-source ecosystem. Read more: earthmover.io/blog/what-is...

Fundamentals: What Is Zarr? A Cloud-Native Format for Tensor Data - Earthmover

What Zarr is, and how it enables fast, scalable access to multidimensional array data in the cloud.

earthmover.io

3 10

Reposted by Sebastian Galkin

Earthmover @earthmover.io · May 14

𝐻𝑜𝑤 𝑑𝑜𝑒𝑠 𝐼𝑐𝑒𝑐ℎ𝑢𝑛𝑘 𝑎𝑣𝑜𝑖𝑑 𝑟𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑡 𝑠𝑡𝑜𝑟𝑎𝑔𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑑𝑎𝑡𝑎 𝑣𝑒𝑟𝑠𝑖𝑜𝑛𝑠?

Icechunk stores only new or changed chunks for each version —no redundant copies or rewrites. You get instant time travel, branching, and efficient updates, all with negligible storage overhead.

More: bit.ly/3F1XFST

Icechunk: Efficient storage of versioned array data - Earthmover

We recently got an interesting question in Icechunk’s community Slack channel (thank you Iury Simoes-Sousa for motivating this post): I’m new to Icechunk. How is the storage managed for redundant info...

earthmover.io

4 3

Reposted by Sebastian Galkin

Earthmover @earthmover.io · May 12

Our latest blog post dives into the chaos of the status quo - where every tweak means regenerating the 𝑤ℎ𝑜𝑙𝑒 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 and collaboration and experimentation is often stifled by silos and secret knowledge. Check out the full post: earthmover.io/blog/tensoro...

TensorOps: Scientific Data Doesn't Have to Hurt - Earthmover

Curious how your team scores on the "Data Pain Survey"? Wondering why your teams are building Rube Goldberg machines just to put some data on a map? Or just want to see our plan to bring order to your...

earthmover.io

3 3

Sebastian Galkin @functionth.bsky.social · May 5

After months of Rust, I wrote some Python this weekend. I immediately got burned by global mutable state

7

Sebastian Galkin @functionth.bsky.social · Apr 29

Last week @deepakcherian.bsky.social gave a fascinating talk at NCAR on data sharing and open-data. The historic perspective, the achievements and failures past and present, how to learn and move forward to fulfill the promises. Remarkable and illuminating www.youtube.com/watch?v=JZT3...

CISL Seminar: Deepak Cherian (Earthmover)

YouTube video by NCAR Computational and Information Systems Laboratory (CISL)

www.youtube.com

1

Sebastian Galkin @functionth.bsky.social · Apr 23

Had the idea of using Icechunk (an multi-dimensional array database) for something I would never use Icechunk for

Earthmover @earthmover.io · Apr 23

1/ 🚨 New Blog Post Alert: "𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝐴𝑏𝑜𝑢𝑡 𝐼𝑐𝑒𝑐ℎ𝑢𝑛𝑘 𝐶𝑜𝑛𝑠𝑖𝑠𝑡𝑒𝑛𝑐𝑦 𝑤𝑖𝑡ℎ 𝑎 𝐶𝑙𝑖𝑐ℎ𝑒́𝑑 𝑏𝑢𝑡 𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑣𝑒 𝐸𝑥𝑎𝑚𝑝𝑙𝑒" 🏦🔁

👉 Read it here: earthmover.io/blog/learnin...

Learning about Icechunk consistency with a clichéd but instructive example - Earthmover

In this post we’ll show what can happen when more than one process write to the same Icechunk repository concurrently, and how Icechunk uses transactions and conflict resolution to guarantee consisten...

earthmover.io

Reposted by Sebastian Galkin

Earthmover @earthmover.io · Apr 17

1/ 💡 Our latest blog post in the fundamentals series, written by @tegnicholas.bsky.social, demystifies cloud-optimized scientific data formats!

Read more: earthmover.io/blog/fundame...

Fundamentals: What is Cloud-Optimized Scientific Data?

What cloud-optimized data really means, and how Zarr and Icechunk enable fast access to massive scientific datasets in cloud object storage.

earthmover.io

2 9 16

Reposted by Sebastian Galkin

TEGNicholas.bsky.social @tegnicholas.bsky.social · Apr 10

You could also do this for arbitrarily large scientific array datasets using Xarray + Icechunk + R2/Tigris

juhache.substack.com/p/0-data-dis...

0$ Data Distribution

Ju Data Engineering Weekly - Ep 78

juhache.substack.com

1

Sebastian Galkin @functionth.bsky.social · Apr 9

230k reads/sec or much more. The S3ky is the limit!

1

Reposted by Sebastian Galkin

Earthmover @earthmover.io · Apr 9

📣 Blog post alert! 𝐄𝐱𝐩𝐥𝐨𝐫𝐢𝐧𝐠 𝐈𝐜𝐞𝐜𝐡𝐮𝐧𝐤 𝐬𝐜𝐚𝐥𝐚𝐛𝐢𝐥𝐢𝐭𝐲: 𝐮𝐧𝐭𝐚𝐧𝐠𝐥𝐢𝐧𝐠 𝐒𝟑'𝐬 𝐩𝐫𝐞𝐟𝐢𝐱 𝐬𝐭𝐨𝐫𝐲. This technical post by @functionth.bsky.social dives deep into the internals of how S3 shards data, showing that distributed Icechunk can easily perform 230,000 object reads/sec and beyond. earthmover.io/blog/explori...

Exploring Icechunk scalability: untangling S3's prefix story | Earthmover

We show Icechunk can scale to extremely high concurrency levels, and explain how it achieves this in modern object stores.

earthmover.io

2 4 5

Reposted by Sebastian Galkin

Joe Hamman @jhamman.bsky.social · Apr 3

We often see folks try to convince tabular data tools to perform well with multi-dimensional array data. This post by @rabernat.bsky.social explains, from first principles, why this rarely works. Its a good one! 👇👇👇

Earthmover @earthmover.io · Apr 3

⭐ We just released the first post in our Fundamentals series. This one is called 𝐓𝐞𝐧𝐬𝐨𝐫𝐬 𝐯𝐬. 𝐓𝐚𝐛𝐥𝐞𝐬 - 𝐖𝐡𝐲 𝐭𝐚𝐛𝐮𝐥𝐚𝐫 𝐭𝐨𝐨𝐥𝐬 𝐭𝐫𝐢𝐩 𝐨𝐯𝐞𝐫 𝐠𝐫𝐢𝐝𝐝𝐞𝐝 𝐝𝐚𝐭𝐚. earthmover.io/blog/tensors...

Fundamentals: Tensors vs. Tables | Earthmover

Why tabular tools trip over gridded data.

earthmover.io

1 1 3

Sebastian Galkin @functionth.bsky.social · Mar 28

I've worked on Icechunk almost exclusively for the last six months. I'm very proud of the result; you should check it out.

Earthmover @earthmover.io · Mar 28

1/ 🚀 Solving #NASA ’s cloud data dilemma: Icechunk unlocks 100x faster access to archival data formats

We're thrilled to publish results from our pilot project with NASA and @developmentseed.org to enable high-performance cloud-native access for NASA’s 100s of petabytes of Earth observation data.

3

Reposted by Sebastian Galkin

Earthmover @earthmover.io · Feb 20

1/ Check out our latest blog post earthmover.io/blog/xarray-... to learn about the dramatic improvement and performance of Xarray’s Zarr backend. We achieved improved the “time to first byte” metric, building on Zarr-Python’s new asyncio internals.

Accelerating Xarray with Zarr-Python 3 | Earthmover

We have recently dramatically improved the performance of Xarray’s Zarr backend. This post explores how we’ve improved the “time to first byte” metric, building on Zarr-Python’s new asyncio internals.

earthmover.io

1 3 4