Lightnews — Scholar-powered news

Mooncake Labs

@mooncakelabs.bsky.social

tldr: yes

January 22, 2025 at 5:18 AM

Mooncake Labs

@mooncakelabs.bsky.social

github.com/Mooncake-Lab... :)

GitHub - Mooncake-Labs/pg_mooncake: Iceberg/Delta Columnstore Table in Postgres

Iceberg/Delta Columnstore Table in Postgres. Contribute to Mooncake-Labs/pg_mooncake development by creating an account on GitHub.

github.com

December 12, 2024 at 7:07 PM

Mooncake Labs

@mooncakelabs.bsky.social

We'd actually want to take a step further. Data engineering && science will converge.

That's our job.

November 19, 2024 at 1:18 AM

Mooncake Labs

@mooncakelabs.bsky.social

(11/11)
We will continue to make pg_mooncake production-ready. Features like compaction, secondary index and constraints, partitioned tables are on our roadmap.
Mooncake 🥮 will build the lakehouse loved by developers. pg_mooncake is a start in this direction.

November 7, 2024 at 7:09 PM

Mooncake Labs

@mooncakelabs.bsky.social

(10/11)
Transactional updates/deletes:
An implicit row_id column is added during TableScan.
The delete & update operator tracks all the row_ids marked, reads rows from affected Parquet files & write new Parquet files.
All the table metadata are stored inside a Postgres table

November 7, 2024 at 7:09 PM

Mooncake Labs

@mooncakelabs.bsky.social

(9/11)
Now on Query lifetime.
Postgres parses & plans, detects columnstore tables, routes to DuckDB, then streams results back via pg_duckdb.
We also added a DuckDB storage extension to interact with our storage format.

November 7, 2024 at 7:09 PM

Mooncake Labs

@mooncakelabs.bsky.social

(8/11)
Table metadata is stored in Postgres for transactional consistency.
Both DuckDB and Postgres are highly extensible. Even their native tables are implemented using the same mechanisms, enabling third-party tables to achieve the same functionality with minimal overhead.

November 7, 2024 at 7:09 PM

Mooncake Labs

@mooncakelabs.bsky.social

(7/11)
INSERT and COPY will create new Parquet files, UPDATE and DELETE will remove existing Parquet files + create new ones
pg_mooncake is a TAM for columnstore table interface within postgres + a storage extension for Parquet files within DuckDB.

November 7, 2024 at 7:09 PM

Mooncake Labs

@mooncakelabs.bsky.social

(6/11)
Now the fun stuff., storage format.
While proprietary formats are better for perf, an open format allows sharing across engines.
We write Parquet & table metadata in open formats Iceberg and Delta Lake.
A mooncake table is directly readable outside of Postgres.

November 7, 2024 at 7:08 PM

Mooncake Labs

@mooncakelabs.bsky.social

(5/11)
pg_mooncake brings a columnstore table with both storage and compute that can leverage the format. It supports transactional inserts, updates, and deletes, just like regular Postgres tables.

November 7, 2024 at 7:07 PM

Mooncake Labs

@mooncakelabs.bsky.social

(4/11)
Recently, we’ve seen a trend to embed DuckDB inside Postgres.
Extensions like pg_duckdb and pg_analytics are designed for querying external columnar storage in Postgres, but they can’t write out data in those formats via transactional insert/update/deletes.

November 7, 2024 at 7:07 PM

Mooncake Labs

@mooncakelabs.bsky.social

(3/11)
Enter DuckDB 🦆
For the unfamiliar, think of DuckDB as SQLite for analytics. It can be embedded into processes, in our case, Postgres.
It’s fast, can scale beyond a single node, and its syntax follows Postgres closely.

November 7, 2024 at 7:07 PM

Mooncake Labs

@mooncakelabs.bsky.social

(2/11):
There has been little success of 'analytics in pg'.
Analytics require columnar storage, vectorized execution, late materialization –– things pg isn’t designed for.
Projects like Citus & Timescale added columnar storage, but there was no good vectorized execution engine.

November 7, 2024 at 7:07 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news