Mimoune Djouallah
mimdj.bsky.social
Mimoune Djouallah
@mimdj.bsky.social
2K followers 280 following 200 posts
#MicrosofFabric Customer advocate, interests in Small Data & Self Service #Microsoftemployee since Dec 2023 , but my tweets are my own
Posts Media Videos Starter Packs
Reposted by Mimoune Djouallah
Any system that allows exchanging real money for stuff with an element of chance is morally equivalent to a casino.

Corollary: Pokémon cards, Roblox, Labubus, and even claw machines should all be 18+
Do you have any extremely niche, but serious, ethical stances?
you are looking at #duckdb running tpch 1 TB with only 16 cores
it used to crash even with 64

pip install duckdb --upgrade is an act of faith basically
Put together a small python package duckrun :) point it at a folder of SQL/Python files, define a pipeline, and it will create Delta tables in #OneLake with #DuckDB and #delta_rs

github.com/djouallah/du...
actually #Microsoftfabric Datawarehouse automatically expose an Iceberg rest Catalog
thanks to #duckdb UI extension, you can see proper catalog
Reposted by Mimoune Djouallah
#pyconau @mimdj.bsky.social Life Beyond Pandas: Workflows with DuckDB, Daft, Polars, and Datafusion http://youtu.be/SnogunyMnE8
2 months ago, I got access to a beta release of #onelake #Apacheiceberg REST Catalog, first thing I run it with #duckdb 😀
storage format should not be tied to #SQL logic, #duckdb got it so right !!! but a bit sad that #deltalake is left behind :(
you know me too well :)
good news #duckdb added support for reading and writing geometry data type

Bad news : other Fabric engines don't support it yet, so it is not very useful for now :(
I nearly get the logic behind delta parquet, but clearly people did not like it 😅
I was listening to a guy from S3 and he describe it beautifully, those table format are a way to manage Parquet !!!
Third time is the charm ✨
With the much needed improvements to the #MicrosoftFabric scheduler, I revisited my review of Fabric F2.
#duckdb #sql
www.youtube.com/watch?v=tchY...
Third Look at Fabric F2
YouTube video by DataMonkey
www.youtube.com
maybe decoupling storage from compute was not a very good idea 😅 I am joking, it was worth it
Btw, because engines are getting so fast the bottleneck become remote storage bandwidth 😁 not compute
Writing #ApacheIceberg in Azure is not particularly hard, but you do need a catalog (essentially a database). For simple tests, you can use an in-memory DB
#ADLS #opentableformat #PyIceberg.
Apologies, I thought you were joking, for any new project, I don't think using pandas is a good idea