jd
banner
jd.codes
jd
@jd.codes
Programmer by day, father all day, programmer later on at night, & gamer even later at night.

❤️ Ruby, Python, and Elisp (and a little bit of Go maybe)

https://jd.codes
🔥 Woah, just learned about Taleshape for the first time!

I'm working on a greenfield product and am _very_ curious about using it. We have Metabase right now for internal BI but this looks great for at least the application feature side of things for our customers.
September 29, 2025 at 2:58 PM
I really wish it didn’t feel like people are hoping it was completely malicious. 😞
September 22, 2025 at 12:20 AM
It really is a problem that people that are so intelligent and innovative in one domain think their domain knowledge transfers to make them an authority on an unrelated topic. 😔
September 21, 2025 at 12:04 AM
I’m definitely onboard with this take.

It feels like they’re not “design patterns” as much as “collections of behavior” helping you discover (not build into!) a specific role/set of responsibilities that behavior defines. They’re heuristics for naming behavior not implementation guides
September 9, 2025 at 8:31 PM
Not really. I looked into it but I just had a hard time shifting away from the asset focused model Dagster has to a pipeline focused one like Prefect. I'd like to experiment with it one day though cause it looks cool!
September 8, 2025 at 1:25 PM
It helps our company is a year old and the data platform is of course very green. There maybe a world where this stack has unforeseen limitations that’ll impact us, but that’s hard to imagine with what we know now.
September 8, 2025 at 12:35 AM
To be clear, DuckDB is our metadata database as well as our primary engine.
September 8, 2025 at 12:32 AM
Postgres is one possibility for that but we’re just using a DuckDB since it’s simpler and cheaper.
September 8, 2025 at 12:31 AM
We felt that Dagsters focus on materialized assets instead of the DAG itself fit our model better. The baked in metadata management, lineage, and scheduling capabilities were also a big plus. Airflow is incredible software but the overhead for managing these things is too high for our small team.
September 8, 2025 at 12:29 AM
MotherDuck is providing the storage for DuckLake parquet files (in their s3 buckets currently although you can BYO) and providing compute instances. They have a feature called read scaling that allows you to federate read only duckdb compute instances to specific db share which is useful for us.
September 6, 2025 at 6:16 AM
Dagster is orchestrating all our pipelines end to end. We're pulling data from APIs, working on app db ingestion, (soon) orchestrating LLM training and (soon) isolated federated db shares. DBT is just the modeling part of our pipeline atm.
September 6, 2025 at 6:14 AM
For large datasets the metadata is so useful because you get things like schema evolution, time travel (querying the db as it was in the past), and a few other features. DuckLakes format is based on a database storing the metadata and iceberg is based on parsing actual files at query time.
September 6, 2025 at 12:10 AM
DuckLake is an open table format similar to Apache Iceberg. So your data is stored as files and both Iceberg and DuckLake store additional data file metadata which act as a mapping to the to the actual data files. The difference is, iceberg is based on files and DuckLake is based on a DB.
September 6, 2025 at 12:07 AM
There’s a dagster-dbt integration library that exposes the DBT models and metadata to be translated into Dagster assets + metadata. It’s a pretty seamless integration. You even get the DBT tests as asset checks 😎. Dagster then just runs the DBT binary during materialization so it just works.
September 6, 2025 at 12:05 AM
There's a bit of a learning curve with this stack, especially if you come from tools Airflow, Spark, or Trino. But once you're able to understand it, it's surprisingly simple. This is one of the things I'm most proud to have been working on in my career.
September 5, 2025 at 3:00 PM
Dagsters "software defined assets" model is such a good heuristic for managing data orchestration pipelines. It's architecture is also incredibly simple and easy to maintain. Controlling dependencies, visualizing lineage, and integrations with tons of other tech make it a joy to work with.
September 5, 2025 at 3:00 PM
It's a very simple stack whose core philosophy is _just write SQL_. DuckDB makes managing the data _easy_ and allows us to ingest sources natively from just about anywhere we'd need to. Using DuckLake also means we have almost 1:1 parity with an Iceberg stack w/o the overhead of metadata files.
September 5, 2025 at 3:00 PM
I’ve been seeing Helix pop up more and more in readme.mds for various local dev tooling… looks cool!
September 4, 2025 at 9:41 PM