Sarah Krasnik Bedell
@sarahkb.bsky.social
Growth marketer & data engineer. Currently growth @ Prefect, prev data @ Perpay.
📍 VT, ⛷️ & ⛵️
sarahsnewsletter.substack.com
📍 VT, ⛷️ & ⛵️
sarahsnewsletter.substack.com
my takeaway today: I have a masculine tone 😂
February 7, 2025 at 8:51 PM
my takeaway today: I have a masculine tone 😂
Yeah that's what I'm thinking - pile definitely bad. Single feels so convenient, but wondering if that should be done or not. To be clear, I do it all the time too for convenience
January 7, 2025 at 12:25 PM
Yeah that's what I'm thinking - pile definitely bad. Single feels so convenient, but wondering if that should be done or not. To be clear, I do it all the time too for convenience
Reposted by Sarah Krasnik Bedell
Yeah I used (or tried to use) Airflow but it kept eating so much RAM I could not do anything else on my laptop
Switched to prefect and never looked back
I wish it was more widespread in the industry
Switched to prefect and never looked back
I wish it was more widespread in the industry
January 6, 2025 at 5:32 PM
Yeah I used (or tried to use) Airflow but it kept eating so much RAM I could not do anything else on my laptop
Switched to prefect and never looked back
I wish it was more widespread in the industry
Switched to prefect and never looked back
I wish it was more widespread in the industry
If you're trying to grow as an IC: data engineering (requirement for every other data function)
If you're trying to run a data team: analytics (learning to work with stakeholders)
Or, use data as a gateway to learning and evolve your career once again
If you're trying to run a data team: analytics (learning to work with stakeholders)
Or, use data as a gateway to learning and evolve your career once again
December 10, 2024 at 4:07 AM
If you're trying to grow as an IC: data engineering (requirement for every other data function)
If you're trying to run a data team: analytics (learning to work with stakeholders)
Or, use data as a gateway to learning and evolve your career once again
If you're trying to run a data team: analytics (learning to work with stakeholders)
Or, use data as a gateway to learning and evolve your career once again
I guess what I mean is, OL is a framework, but relies on other tools to be useful. I'm thinking about where we will go for the one stop shop of answering - "this thing failed, why tho"
November 27, 2024 at 3:26 PM
I guess what I mean is, OL is a framework, but relies on other tools to be useful. I'm thinking about where we will go for the one stop shop of answering - "this thing failed, why tho"
I'm curious why OL?
I recently watched the airflow summit 2023 video on it - isn't it just an Airflow plugin for dags that relies on manual hooks and lacks deep integration with data or infra assets? I'd also expect some UI around lineage.
If I'm making naive assumptions correct me
I recently watched the airflow summit 2023 video on it - isn't it just an Airflow plugin for dags that relies on manual hooks and lacks deep integration with data or infra assets? I'd also expect some UI around lineage.
If I'm making naive assumptions correct me
November 27, 2024 at 3:23 PM
I'm curious why OL?
I recently watched the airflow summit 2023 video on it - isn't it just an Airflow plugin for dags that relies on manual hooks and lacks deep integration with data or infra assets? I'd also expect some UI around lineage.
If I'm making naive assumptions correct me
I recently watched the airflow summit 2023 video on it - isn't it just an Airflow plugin for dags that relies on manual hooks and lacks deep integration with data or infra assets? I'd also expect some UI around lineage.
If I'm making naive assumptions correct me
Yup 100%. But then tie that audit log to actual assets.
November 27, 2024 at 3:13 PM
Yup 100%. But then tie that audit log to actual assets.
I think we need to step back and define lineage. Before we defined it just in terms of data assets.
But what if you're running a python ETL process pre-warehouse and your infra dies? The output of that job would be out of date.
That's also lineage, and not in SQL. So we need to solve for that too.
But what if you're running a python ETL process pre-warehouse and your infra dies? The output of that job would be out of date.
That's also lineage, and not in SQL. So we need to solve for that too.
November 27, 2024 at 1:31 PM
I think we need to step back and define lineage. Before we defined it just in terms of data assets.
But what if you're running a python ETL process pre-warehouse and your infra dies? The output of that job would be out of date.
That's also lineage, and not in SQL. So we need to solve for that too.
But what if you're running a python ETL process pre-warehouse and your infra dies? The output of that job would be out of date.
That's also lineage, and not in SQL. So we need to solve for that too.
Not quite sure if your exact use case, but checkout @prefect.io. retries, logging, and caching right out the box
November 27, 2024 at 12:23 PM
Not quite sure if your exact use case, but checkout @prefect.io. retries, logging, and caching right out the box
A little bit of a different flavor, but I wrote this back in 2022 and feel like it still mostly applies today
sarahsnewsletter.substack.com/p/everyone-s...
sarahsnewsletter.substack.com/p/everyone-s...
Everyone Should Care About Data Storage
From data warehouses, lakes, to realtime applications: they’re all part of the journey to making data useful.
sarahsnewsletter.substack.com
November 27, 2024 at 12:22 PM
A little bit of a different flavor, but I wrote this back in 2022 and feel like it still mostly applies today
sarahsnewsletter.substack.com/p/everyone-s...
sarahsnewsletter.substack.com/p/everyone-s...
Query languages for the SQL-esque ones, and data python packages for the others (to be exact)
November 27, 2024 at 12:20 PM
Query languages for the SQL-esque ones, and data python packages for the others (to be exact)
So: make sure to run only your ML work on expensive GPUs, and run your lightweight ETL on small compute and utilize the warehouse credits you need to use before 2025 instead.
I'm hearing this is a problem when data eng / data platform become different teams.
Who's encountered this?
#dataBS
I'm hearing this is a problem when data eng / data platform become different teams.
Who's encountered this?
#dataBS
November 27, 2024 at 2:44 AM
So: make sure to run only your ML work on expensive GPUs, and run your lightweight ETL on small compute and utilize the warehouse credits you need to use before 2025 instead.
I'm hearing this is a problem when data eng / data platform become different teams.
Who's encountered this?
#dataBS
I'm hearing this is a problem when data eng / data platform become different teams.
Who's encountered this?
#dataBS
Reddit is quickly growing it's user base and content - there's definitely more mess than before, but I've found posting genuine, detailed comments get engagement.
PS ignore the trolls, only way forward
PS ignore the trolls, only way forward
November 15, 2024 at 1:55 PM
Reddit is quickly growing it's user base and content - there's definitely more mess than before, but I've found posting genuine, detailed comments get engagement.
PS ignore the trolls, only way forward
PS ignore the trolls, only way forward
Sure that's one use case
But what if an event happens but it's throttled to only run a thing every 5 min? Then it's not realtime
I think realtime is about the SLA of the output the event is triggering
So there's a venn diagram with an overlapping middle
But what if an event happens but it's throttled to only run a thing every 5 min? Then it's not realtime
I think realtime is about the SLA of the output the event is triggering
So there's a venn diagram with an overlapping middle
November 12, 2024 at 12:55 AM
Sure that's one use case
But what if an event happens but it's throttled to only run a thing every 5 min? Then it's not realtime
I think realtime is about the SLA of the output the event is triggering
So there's a venn diagram with an overlapping middle
But what if an event happens but it's throttled to only run a thing every 5 min? Then it's not realtime
I think realtime is about the SLA of the output the event is triggering
So there's a venn diagram with an overlapping middle
Totally fair. I do think oftentimes realtime and event based get confused as one, which they're not
November 12, 2024 at 12:52 AM
Totally fair. I do think oftentimes realtime and event based get confused as one, which they're not
Claude can adjust to tone so much better (even with equal context). I'm a convert
November 11, 2024 at 2:14 PM
Claude can adjust to tone so much better (even with equal context). I'm a convert