That's our job.
That's our job.
We will continue to make pg_mooncake production-ready. Features like compaction, secondary index and constraints, partitioned tables are on our roadmap.
Mooncake 🥮 will build the lakehouse loved by developers. pg_mooncake is a start in this direction.
We will continue to make pg_mooncake production-ready. Features like compaction, secondary index and constraints, partitioned tables are on our roadmap.
Mooncake 🥮 will build the lakehouse loved by developers. pg_mooncake is a start in this direction.
Transactional updates/deletes:
An implicit row_id column is added during TableScan.
The delete & update operator tracks all the row_ids marked, reads rows from affected Parquet files & write new Parquet files.
All the table metadata are stored inside a Postgres table
Transactional updates/deletes:
An implicit row_id column is added during TableScan.
The delete & update operator tracks all the row_ids marked, reads rows from affected Parquet files & write new Parquet files.
All the table metadata are stored inside a Postgres table
Now on Query lifetime.
Postgres parses & plans, detects columnstore tables, routes to DuckDB, then streams results back via pg_duckdb.
We also added a DuckDB storage extension to interact with our storage format.
Now on Query lifetime.
Postgres parses & plans, detects columnstore tables, routes to DuckDB, then streams results back via pg_duckdb.
We also added a DuckDB storage extension to interact with our storage format.
Table metadata is stored in Postgres for transactional consistency.
Both DuckDB and Postgres are highly extensible. Even their native tables are implemented using the same mechanisms, enabling third-party tables to achieve the same functionality with minimal overhead.
Table metadata is stored in Postgres for transactional consistency.
Both DuckDB and Postgres are highly extensible. Even their native tables are implemented using the same mechanisms, enabling third-party tables to achieve the same functionality with minimal overhead.
INSERT and COPY will create new Parquet files, UPDATE and DELETE will remove existing Parquet files + create new ones
pg_mooncake is a TAM for columnstore table interface within postgres + a storage extension for Parquet files within DuckDB.
INSERT and COPY will create new Parquet files, UPDATE and DELETE will remove existing Parquet files + create new ones
pg_mooncake is a TAM for columnstore table interface within postgres + a storage extension for Parquet files within DuckDB.
Now the fun stuff., storage format.
While proprietary formats are better for perf, an open format allows sharing across engines.
We write Parquet & table metadata in open formats Iceberg and Delta Lake.
A mooncake table is directly readable outside of Postgres.
Now the fun stuff., storage format.
While proprietary formats are better for perf, an open format allows sharing across engines.
We write Parquet & table metadata in open formats Iceberg and Delta Lake.
A mooncake table is directly readable outside of Postgres.
pg_mooncake brings a columnstore table with both storage and compute that can leverage the format. It supports transactional inserts, updates, and deletes, just like regular Postgres tables.
pg_mooncake brings a columnstore table with both storage and compute that can leverage the format. It supports transactional inserts, updates, and deletes, just like regular Postgres tables.
Recently, we’ve seen a trend to embed DuckDB inside Postgres.
Extensions like pg_duckdb and pg_analytics are designed for querying external columnar storage in Postgres, but they can’t write out data in those formats via transactional insert/update/deletes.
Recently, we’ve seen a trend to embed DuckDB inside Postgres.
Extensions like pg_duckdb and pg_analytics are designed for querying external columnar storage in Postgres, but they can’t write out data in those formats via transactional insert/update/deletes.
Enter DuckDB 🦆
For the unfamiliar, think of DuckDB as SQLite for analytics. It can be embedded into processes, in our case, Postgres.
It’s fast, can scale beyond a single node, and its syntax follows Postgres closely.
Enter DuckDB 🦆
For the unfamiliar, think of DuckDB as SQLite for analytics. It can be embedded into processes, in our case, Postgres.
It’s fast, can scale beyond a single node, and its syntax follows Postgres closely.
There has been little success of 'analytics in pg'.
Analytics require columnar storage, vectorized execution, late materialization –– things pg isn’t designed for.
Projects like Citus & Timescale added columnar storage, but there was no good vectorized execution engine.
There has been little success of 'analytics in pg'.
Analytics require columnar storage, vectorized execution, late materialization –– things pg isn’t designed for.
Projects like Citus & Timescale added columnar storage, but there was no good vectorized execution engine.