Dylan Pieper
banner
dylanpieper.bsky.social
Dylan Pieper
@dylanpieper.bsky.social
Data scientist @ Pitt • Dog dad 🐕 • Pilot 🪂 • #rstats • https://dylanpieper.github.io
Reposted by Dylan Pieper
Stretching DuckDB w/ Common Crawl, ~1.7B rows, ~300 parquet files. ~2-3s for single-column aggregations, ~2-3 mins to SUMMARIZE the data, peaking at ~12-14GB memory usage. Not exactly real-time, but the fact you can do this on a laptop with no server setups or Spark pipelines is still amazing.
August 15, 2025 at 3:10 AM
Reposted by Dylan Pieper
Remember this #rstats post? I wasn't the only one talking about it & the tidyverse team was listening 😎 #databs

New #dplyr functions? They're looking for feedback!!
🤔 replace_when, recode_values, replace_values

👀 Read this:
github.com/tidyverse/ti...

🗣️ Comment on PR:
github.com/tidyverse/ti...
Ever needed to recode a variable based on whether it matches something in a named list? Have you ever used rlang's `!!!` operator? It's pretty dang cool. I feel like {rlang} is so cool, but so enigmatic 😂 #rstats

Here's a pic & gist (bc it's frustrating to not be able to copy/paste the pic) #databs
August 4, 2025 at 5:30 PM
Reposted by Dylan Pieper
I think pedocon theory is right. It’s empirically adequate, parsimonious, fits within a broader theoretical framework, and has immense explanatory breadth and depth www.liberalcurrents.com/we-need-to-t...
We Need to Talk About Pedocon Theory
The connection between Donald Trump and Jeffrey Epstein is no accident, but reveals a deep logic at the heart of reactionary politics.
www.liberalcurrents.com
July 29, 2025 at 11:10 AM
Reposted by Dylan Pieper
I am such a sucker for frivolous uses of AI. Here's an anthem for the tidyverse: suno.com/s/iVMVs4IoyA...
suno.com
July 11, 2025 at 8:35 PM
Reposted by Dylan Pieper
Very cool to see authors of this article mentioning the importance of sharing project-, data-, AND variable-level documentation alongside data in a repository, and linking to the templates I've provided on OSF as an example! 🌟

doi.org/10.1515/ling...
July 8, 2025 at 7:05 PM
Reposted by Dylan Pieper
As a data manager, good documentation not only helps me do my job better, but also helps me annoy you less! 😅

Good documentation about inclusion criteria, READMEs about oddities in the data, consort diagrams and tracking to explain missing data, and so on, are all ways to ensure I bug you less! 🐛🐜🐝
June 27, 2025 at 5:46 PM
Reposted by Dylan Pieper
New to me is the term "premature closure", where you too quickly latch on to the first solution you see. Always a danger in coding, but particularly so today when LLMs can give you a plausible fix so so quickly.

www.shayon.dev/post/2025/16...
Pitfalls of premature closure with LLM assisted coding
When LLM models generates clean, professional-looking code, it's tempting to stop exploring alternatives. But therein lies the risks that comes with premature closure. So what is premature closure?
www.shayon.dev
June 18, 2025 at 2:17 PM
Reposted by Dylan Pieper
Bleeding edge update for the #tidyverse purrr package with even more seamless #rstats parallel maps.

Introducing our shiniest new adverb: `in_parallel()`. Just wrap your function to take advantage of blazing fast parallel processing via mirai.

pak::pak("tidyverse/purrr")

purrr.tidyverse.org/dev/
Functional Programming Tools
A complete and consistent functional programming toolkit for R.
purrr.tidyverse.org
June 13, 2025 at 3:32 PM
Reposted by Dylan Pieper
One cool thing you can/should do is sample from priors only, and plot the distribution of the actual quantity of interest (ex: risk ratio). I find this very useful. This is actually super easy with brms. arelbundock.com/posts/margin...
Prior Predictive Checks with marginaleffects and brms – Vincent Arel-Bundock
arelbundock.com
June 12, 2025 at 9:52 PM
Reposted by Dylan Pieper
Reposted by Dylan Pieper
Here's a functional programming trick for #rstats that I wish I started using sooner:

if you need a #ggplot2 scale to be reusable across multiple plots and dynamically configurable without relying on global state, consider using a function factory (a function that returns a function) to build it
May 29, 2025 at 11:36 PM
Reposted by Dylan Pieper
mirai - minimalist async framework for #RStats - released as an 'r-lib' package.

Blog post: Advancing Async Computing in R.
shikokuchuo.net/posts/26-mir...

mirai provides event-driven async for #RShiny and parallel processing for purrr #tidyverse.

Really excited to be working on this at Posit!
shikokuchuo{net}: mirai 2.3.0
Advancing Async Computing in R
shikokuchuo.net
May 23, 2025 at 2:12 PM
Reposted by Dylan Pieper
tl;dr — this EO co-opts the language of open science to implement a system of political control wherein presidential appointees are given broad latitude to designate any number of reasonable scientific activities and inferences as scientific misconduct, and to penalize those involved accordingly.
Restoring Gold Standard Science
By the authority vested in me as President by the Constitution and the laws of the United States of America, including section 7301 of title 5, United
www.whitehouse.gov
May 24, 2025 at 9:28 PM
Reposted by Dylan Pieper
There's so much polarization around LLMs. They are way overhyped, I agree. But I also use them semi-regularly now.

Here's a thread of genuine use cases where I find them helpful. Please add your own!
May 20, 2025 at 7:51 PM
Reposted by Dylan Pieper
📦 I’m excited to share a new #rstats package I’ve been working on: {shinyfa} built to help folks working on large or unfamiliar #rshiny apps ✨

The package scans your app folders and extracts out details on render*(), reactive() and input$ to a dataframe!

📖 www.dalyanalytics.com/blog/shinyfa...
Introducing {shinyfa}: Analyze Large Shiny App Codebases Faster with This R Package | Daly Analytics
Discover {shinyfa}, a new R package designed to improve developer experience by analyzing and summarizing the structure of large Shiny applications. Perfect for consultants, teams, and contributors wo...
www.dalyanalytics.com
May 19, 2025 at 1:47 PM
Reposted by Dylan Pieper
Playing around with satellite imagery of #madison to make some office art. #Rstats
May 18, 2025 at 1:24 PM
Reposted by Dylan Pieper
✨Use llms from #rstats with ellmer ✨Version 0.2.0 is on CRAN now. No blog post yet because I'm about to go on vacation, but in the meantime you can check out the release notes: github.com/tidyverse/el....
github.com
May 18, 2025 at 2:13 PM
Reposted by Dylan Pieper
The kind of Friday morning content I needed to see. ❤️
Every data steward at a faculty meeting.
May 16, 2025 at 11:35 AM
Reposted by Dylan Pieper
Registration for the posit::conf(2025) virtual experience is now open!

Join us virtually, Sept 16–18, and access live-streamed keynotes and 100+ talks, on-demand recordings, Q&A sessions, and our virtual networking platform.

Learn more in the blog post: posit.co/blog/posit-c...

#RStats #Python
May 15, 2025 at 2:59 PM
Reposted by Dylan Pieper
In case you missed it, we recently updated some of our packages, including many new features (again) in the #rstats #easystats {modelbased} package:
easystats.github.io/modelbased/n...
The last weeks we were working a lot on improving support and performance for Bayesian models and especially
Changelog
easystats.github.io
May 15, 2025 at 6:10 PM
Reposted by Dylan Pieper
I'm still thinking about my favorite quote from the Posit Data Science Hangout today. It perfectly sums up what I hope I provide to the researchers I work with: a trusted partner, who is there to support them in their work.

Earn a reputation for being a good person to work with
- Cara Thompson
May 8, 2025 at 6:58 PM
Reposted by Dylan Pieper
Great news! R/Medicine 2025 is providing a forum for sharing R based tools and approaches used to analyze and gain insights from health data. Join us for the premier R conference for health and medicine.

🔗 Register today: rconsortium.github.io/RMedicine_we...

#rstats #opensource #RMed25
register – R/Medicine 2025
rconsortium.github.io
May 6, 2025 at 3:52 PM
Reposted by Dylan Pieper
I think a lot about what Carl Sagan said in one of his final interviews.
May 4, 2025 at 6:21 AM
I’m happy to share that I’ll be giving a talk at R/Medicine 2025! 🎊

I work with a BIG REDcap database for substance use treatment (200+ locations) which makes extraction difficult. I developed {redquack}, an #rstats 📦 that transfers REDCap data to DuckDB, and will talk about how to use it. 🦆
May 2, 2025 at 12:37 PM
📊 🕵️‍♂️ #rstats community! Do you sometimes feel like you're just pretending to be a data scientist? I'm researching imposter syndrome for my upcoming talk at posit::conf(2025)

🔍 I'd love to hear YOUR experiences in a short 5-10 minute anonymous survey: forms.gle/YkJtwZWquyKM...

Please share! 🔄
Imposter Syndrome in Data Science
This survey is intended to gather community feedback from data scientists and students or recent graduates interested in data science as a career. Your responses will be anonymous and may be used for...
forms.gle
May 1, 2025 at 4:00 PM