Lightnews — Scholar-powered news

jd

@jd.codes

🔥 Woah, just learned about Taleshape for the first time!

I'm working on a greenfield product and am _very_ curious about using it. We have Metabase right now for internal BI but this looks great for at least the application feature side of things for our customers.

September 29, 2025 at 2:58 PM

jd

@jd.codes

I really wish it didn’t feel like people are hoping it was completely malicious. 😞

September 22, 2025 at 12:20 AM

jd

@jd.codes

It really is a problem that people that are so intelligent and innovative in one domain think their domain knowledge transfers to make them an authority on an unrelated topic. 😔

September 21, 2025 at 12:04 AM

jd

@jd.codes

I’m definitely onboard with this take.

It feels like they’re not “design patterns” as much as “collections of behavior” helping you discover (not build into!) a specific role/set of responsibilities that behavior defines. They’re heuristics for naming behavior not implementation guides

September 9, 2025 at 8:31 PM

jd

@jd.codes

Not really. I looked into it but I just had a hard time shifting away from the asset focused model Dagster has to a pipeline focused one like Prefect. I'd like to experiment with it one day though cause it looks cool!

September 8, 2025 at 1:25 PM

jd

@jd.codes

It helps our company is a year old and the data platform is of course very green. There maybe a world where this stack has unforeseen limitations that’ll impact us, but that’s hard to imagine with what we know now.

September 8, 2025 at 12:35 AM

jd

@jd.codes

To be clear, DuckDB is our metadata database as well as our primary engine.

September 8, 2025 at 12:32 AM

jd

@jd.codes

Postgres is one possibility for that but we’re just using a DuckDB since it’s simpler and cheaper.

September 8, 2025 at 12:31 AM

jd

@jd.codes

We felt that Dagsters focus on materialized assets instead of the DAG itself fit our model better. The baked in metadata management, lineage, and scheduling capabilities were also a big plus. Airflow is incredible software but the overhead for managing these things is too high for our small team.

September 8, 2025 at 12:29 AM

jd

@jd.codes

MotherDuck is providing the storage for DuckLake parquet files (in their s3 buckets currently although you can BYO) and providing compute instances. They have a feature called read scaling that allows you to federate read only duckdb compute instances to specific db share which is useful for us.

September 6, 2025 at 6:16 AM

jd

@jd.codes

Dagster is orchestrating all our pipelines end to end. We're pulling data from APIs, working on app db ingestion, (soon) orchestrating LLM training and (soon) isolated federated db shares. DBT is just the modeling part of our pipeline atm.

September 6, 2025 at 6:14 AM

jd

@jd.codes

For large datasets the metadata is so useful because you get things like schema evolution, time travel (querying the db as it was in the past), and a few other features. DuckLakes format is based on a database storing the metadata and iceberg is based on parsing actual files at query time.

September 6, 2025 at 12:10 AM

jd

@jd.codes

DuckLake is an open table format similar to Apache Iceberg. So your data is stored as files and both Iceberg and DuckLake store additional data file metadata which act as a mapping to the to the actual data files. The difference is, iceberg is based on files and DuckLake is based on a DB.

September 6, 2025 at 12:07 AM

jd

@jd.codes

There’s a dagster-dbt integration library that exposes the DBT models and metadata to be translated into Dagster assets + metadata. It’s a pretty seamless integration. You even get the DBT tests as asset checks 😎. Dagster then just runs the DBT binary during materialization so it just works.

September 6, 2025 at 12:05 AM

jd

@jd.codes

There's a bit of a learning curve with this stack, especially if you come from tools Airflow, Spark, or Trino. But once you're able to understand it, it's surprisingly simple. This is one of the things I'm most proud to have been working on in my career.

September 5, 2025 at 3:00 PM

jd

@jd.codes

Dagsters "software defined assets" model is such a good heuristic for managing data orchestration pipelines. It's architecture is also incredibly simple and easy to maintain. Controlling dependencies, visualizing lineage, and integrations with tons of other tech make it a joy to work with.

September 5, 2025 at 3:00 PM

jd

@jd.codes

It's a very simple stack whose core philosophy is _just write SQL_. DuckDB makes managing the data _easy_ and allows us to ingest sources natively from just about anywhere we'd need to. Using DuckLake also means we have almost 1:1 parity with an Iceberg stack w/o the overhead of metadata files.

September 5, 2025 at 3:00 PM

jd

@jd.codes

I’ve been seeing Helix pop up more and more in readme.mds for various local dev tooling… looks cool!

September 4, 2025 at 9:41 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news