Mooncake Labs
mooncakelabs.bsky.social
Mooncake Labs
@mooncakelabs.bsky.social
the simple lakehouse.
tldr: yes
January 22, 2025 at 5:18 AM
We'd actually want to take a step further. Data engineering && science will converge.

That's our job.
November 19, 2024 at 1:18 AM
(11/11)
We will continue to make pg_mooncake production-ready. Features like compaction, secondary index and constraints, partitioned tables are on our roadmap.
Mooncake 🥮 will build the lakehouse loved by developers. pg_mooncake is a start in this direction.
November 7, 2024 at 7:09 PM
(10/11)
Transactional updates/deletes:
An implicit row_id column is added during TableScan.
The delete & update operator tracks all the row_ids marked, reads rows from affected Parquet files & write new Parquet files.
All the table metadata are stored inside a Postgres table
November 7, 2024 at 7:09 PM
(9/11)
Now on Query lifetime.
Postgres parses & plans, detects columnstore tables, routes to DuckDB, then streams results back via pg_duckdb.
We also added a DuckDB storage extension to interact with our storage format.
November 7, 2024 at 7:09 PM
(8/11)
Table metadata is stored in Postgres for transactional consistency.
Both DuckDB and Postgres are highly extensible. Even their native tables are implemented using the same mechanisms, enabling third-party tables to achieve the same functionality with minimal overhead.
November 7, 2024 at 7:09 PM
(7/11)
INSERT and COPY will create new Parquet files, UPDATE and DELETE will remove existing Parquet files + create new ones
pg_mooncake is a TAM for columnstore table interface within postgres + a storage extension for Parquet files within DuckDB.
November 7, 2024 at 7:09 PM
(6/11)
Now the fun stuff., storage format.
While proprietary formats are better for perf, an open format allows sharing across engines.
We write Parquet & table metadata in open formats Iceberg and Delta Lake.
A mooncake table is directly readable outside of Postgres.
November 7, 2024 at 7:08 PM
(5/11)
pg_mooncake brings a columnstore table with both storage and compute that can leverage the format. It supports transactional inserts, updates, and deletes, just like regular Postgres tables.
November 7, 2024 at 7:07 PM
(4/11)
Recently, we’ve seen a trend to embed DuckDB inside Postgres.
Extensions like pg_duckdb and pg_analytics are designed for querying external columnar storage in Postgres, but they can’t write out data in those formats via transactional insert/update/deletes.
November 7, 2024 at 7:07 PM
(3/11)
Enter DuckDB 🦆
For the unfamiliar, think of DuckDB as SQLite for analytics. It can be embedded into processes, in our case, Postgres.
It’s fast, can scale beyond a single node, and its syntax follows Postgres closely.
November 7, 2024 at 7:07 PM
(2/11):
There has been little success of 'analytics in pg'.
Analytics require columnar storage, vectorized execution, late materialization –– things pg isn’t designed for.
Projects like Citus & Timescale added columnar storage, but there was no good vectorized execution engine.
November 7, 2024 at 7:07 PM