@apachedatafusion.bsky.social meetup on Wednesday: DataDog reports they run 68+million queries per hour with DataFusion
@apachedatafusion.bsky.social meetup on Wednesday: DataDog reports they run 68+million queries per hour with DataFusion
New Apache Parquet Community page is up: parquet.apache.org/community/
New Apache Parquet Community page is up: parquet.apache.org/community/
arrow.apache.org/blog/2025/10...
arrow.apache.org/blog/2025/10...
www.palantir.com/docs/foundry...
www.palantir.com/docs/foundry...
ALP achieves ZSTD levels of compression and much faster decode. We are discussing adding it to @ApacheParquet: lists.apache.org/thread/tjtln...
ALP achieves ZSTD levels of compression and much faster decode. We are discussing adding it to @ApacheParquet: lists.apache.org/thread/tjtln...
We are also working on a blog post that has a deeper explanation
We are also working on a blog post that has a deeper explanation
Which they based on the tpchgen-rs project from @clflushopt.bsky.social github.com/clflushopt/t...
(BTW I a still looking for some more github watchers on tpchgen-rs so I can get it on homebrew)
Which they based on the tpchgen-rs project from @clflushopt.bsky.social github.com/clflushopt/t...
(BTW I a still looking for some more github watchers on tpchgen-rs so I can get it on homebrew)
However, I absolutely think this adds to the pressure for Parquet to evolve.
Speaking of, anyone interested in helping add new encodings to parquet?
lists.apache.org/thread/djnbb...
However, I absolutely think this adds to the pressure for Parquet to evolve.
Speaking of, anyone interested in helping add new encodings to parquet?
lists.apache.org/thread/djnbb...
github.com/jcsherin/dat...
github.com/jcsherin/dat...
datafusion.apache.org/contributor-...
datafusion.apache.org/contributor-...
More deets here github.com/apache/dataf...
More deets here github.com/apache/dataf...
BTW @xiangpeng.systems is looking for some early adopters who want to be on the bleeding edge. Hit me up if interested
BTW @xiangpeng.systems is looking for some early adopters who want to be on the bleeding edge. Hit me up if interested
Here are the slides: docs.google.com/presentation...
Here are the slides: docs.google.com/presentation...
We just need 6 more forks and 24 watchers. Can you help us out?
github.com/clflushopt/t...
Deets: github.com/clflushopt/t...
We just need 6 more forks and 24 watchers. Can you help us out?
github.com/clflushopt/t...
Deets: github.com/clflushopt/t...
SF1000 (1TB raw, 220GB in @ApacheParquet ) in less than 10 mins (6m45s) on aging laptop
Try it now:
pip install tpchgen-cli
tpchgen-cli --scale-factor 1000 --parts 100 --format=parquet
github.com/clflushopt/t...
SF1000 (1TB raw, 220GB in @ApacheParquet ) in less than 10 mins (6m45s) on aging laptop
Try it now:
pip install tpchgen-cli
tpchgen-cli --scale-factor 1000 --parts 100 --format=parquet
github.com/clflushopt/t...
Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Parquet with @apachedatafusion.bsky.social
datafusion.apache.org/blog/2025/08...
Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Parquet with @apachedatafusion.bsky.social
datafusion.apache.org/blog/2025/08...
www.theregister.com/2025/06/20/e...
www.theregister.com/2025/06/20/e...
@apachedatafusion.bsky.social blog from Qi Zhi, Jigao Luo and myself explains how datafusion.apache.org/blog/2025/07...
@apachedatafusion.bsky.social blog from Qi Zhi, Jigao Luo and myself explains how datafusion.apache.org/blog/2025/07...