Donald Szlosek
@dszlosek.bsky.social
850 followers 4.2K following 180 posts
Biostatistician @IDEXX formerly at harvardmed, @BIDMChealth, @nasa. Big data, clinical trials, and medical diagnostics. Mainer. Opinions are my own. he/him
Posts Media Videos Starter Packs
dszlosek.bsky.social
Thinking about joint probability regions
(like 0<y<x−1) and how we compute their probabilities via double integrals.

In principle, Green’s theorem could turn that into a line integral around the boundary — so why don’t we ever do that? Loss of probabilistic meaning, or just unnecessary machinery?
Reposted by Donald Szlosek
statsepi.bsky.social
I read and write, I explore and I question, I design and script and analyse, I interpret and communicate. I do this to train my mind in the hopes of one day generating new knowledge. New knowledge that might even be useful, and that no algorithm can yet be trained on.
hormiga.bsky.social
Y'all. I just got ChatGPT to do everything in R for this manuscript. I mean EVERYTHING. And it's all legit and reproducible. I'm shook.

How are we mentoring our trainees in statistics now? Who needs to learn coding in R line by line, and who doesn't?

scienceforeveryone.science/statistics-i...
Statistics in the era of AI
How do we mentor, teach, and do stats when AI can do so much of the work?
scienceforeveryone.science
Reposted by Donald Szlosek
datavisfriendly.bsky.social
#rstats #dataviz
A "line-up" test has been proposed as a human significance test: Can an observer spot a difference that rejects a null hypothesis?

Here's glyphs for 20 penguins, representing the main variables with visual features.

There are THREE multivariate outliers here. CAN YOU FIND THEM?
Reposted by Donald Szlosek
rconsortium.bsky.social
Coming up next month, register now!

R+AI 2025 - Nov 12-13

Keynote: Joe Cheng, CTO @ Posit

Talk: “Keeping LLMs in Their Lane: Focused AI for Data Science and Research”

Register now!
rconsortium.github.io/RplusAI_webs...

#rstats #AI #DataScience
@posit.co @jcheng5.bsky.social
Joe Cheng, Posit, CTO - headshot
Reposted by Donald Szlosek
pwgtennant.bsky.social
Just because an LLM can produce a report with various figures & charts doesn't mean it is good at statistics.

Because good statistics is not about producing code.

It's about deep knowledge of study design & conduct. In my opinion, 95% of all data science problems come from poor questions & design.
hormiga.bsky.social
Y'all. I just got ChatGPT to do everything in R for this manuscript. I mean EVERYTHING. And it's all legit and reproducible. I'm shook.

How are we mentoring our trainees in statistics now? Who needs to learn coding in R line by line, and who doesn't?

scienceforeveryone.science/statistics-i...
Statistics in the era of AI
How do we mentor, teach, and do stats when AI can do so much of the work?
scienceforeveryone.science
Reposted by Donald Szlosek
tslumley.bsky.social
Oh look. X is strongly correlated with rank(X)
#AxesOfEvil
Reposted by Donald Szlosek
Reposted by Donald Szlosek
hormiga.bsky.social
Y'all. I just got ChatGPT to do everything in R for this manuscript. I mean EVERYTHING. And it's all legit and reproducible. I'm shook.

How are we mentoring our trainees in statistics now? Who needs to learn coding in R line by line, and who doesn't?

scienceforeveryone.science/statistics-i...
Statistics in the era of AI
How do we mentor, teach, and do stats when AI can do so much of the work?
scienceforeveryone.science
Reposted by Donald Szlosek
jessicahullman.bsky.social
Think of how much better off we'd be if every established researcher got in the habit of writing papers entitled "Second thoughts on [thing I'm famous for]"
dszlosek.bsky.social
Excellent piece by Miryam Naddaf discussing a surge in papers that likely use LLMs on open data. This is concerning since I work on one of those with some of these datasets (NHANES, CDC WONDERS, BRFSS). www.nature.com/articles/d41...
Reposted by Donald Szlosek
statsepi.bsky.social
An 8 year-old blog post on causal thinking in epidemiology that I'm sharing for no particular reason (ICYMI).

darrendahly.github.io/post/2017-02...
Cause vs. Consequence |
Principal Statistician | Senior Lecturer
darrendahly.github.io
dszlosek.bsky.social
#academics #AcademicSky
dszlosek.bsky.social
Excellent advice on paper review:

1. Peer reviewers are volunteers.

2. Map all comments to actions.

3. Address all comments.

4. Focus on improving your paper, instead of arguing.

5. Rarely, and only with strong defense, say no.

6. Don’t take things personally.

7. Avoid recreational revisions.
dszlosek.bsky.social
S-Values are much more interpretable than P-values, yet adoption seems near impossible. I wonder what it would take to make the leap? #statssky #episky #rstats #statistics
Reposted by Donald Szlosek
tylermw.com
"Man, I really wish RStudio respected hierarchy in code-folded section headers... I wonder how easy it would be to..."

(inner voice: DON'T DO IT! IT'S NOT WORTH IT! JUST GET BACK TO WORK! THE YAK IS BEST LEFT UNSHORN!)

"... I'm gonna do it."

#RStats #RStudio
Reposted by Donald Szlosek
georgiatomova.bsky.social
We should do a study on how much of the funded applied research suffers from problems that the unfunded methods research could have helped prevent or resolve
dszlosek.bsky.social
my personal favorite seed set up i've is set.seed(666) # \m/ rock on #rstats #databs any others out there?
swampthingpaul.bsky.social
While digging through some code from a manuscript I recently read ... yes, that rabbit hole I came across this line and I think I just found my new favorite set.seed(...) 🤣

set.seed(i+42) # Don’t Panic. “What is the meaning of life, the universe, and everything?”

#Rstats
Reposted by Donald Szlosek
pwgtennant.bsky.social
"Uncooperative statistician": the term used (typically by a senior clinician) to describe a well-trained and knowledgeable statistician who refuses to conduct flawed or fraudulent research.
Reposted by Donald Szlosek
andrew.heiss.phd
If you've ever wanted to learn how to make beautiful websites with #QuartoPub and #rstats , check out this workshop I'm giving in a couple weeks! It'll be a blast (and we're covering Quarto's brand new _brand dot yaml system!)
stathorizons.bsky.social
Learn to create and publish a professional, data-focused website in “Create an Online Presence with Quarto Websites” on October 16-17, with @andrew.heiss.phd‬! Discover how to use #Quarto to build a variety of websites like personal portfolios, research compendiums, and interactive dashboards.
Quarto Websites | Online Seminar | Code Horizons
This online course taught by Andrew Heiss, Ph.D., teaches you how to use Quarto to build a variety of data-focused websites.
codehorizons.com
Reposted by Donald Szlosek
hetanshah.bsky.social
Nice chart from @ourworldindata.org showing the contrast between what Americans die of (heart disease and cancer) v what the US media reports on (homicide and terrorism). This naturally leads to it being trickier to build a fact based world view
ourworldindata.org/does-the-new...
What Americans die from
and the causes of death the US media reports on
Causes of death in the US in 2023
Heart disease (29%)
Cancer (26%)
Accidents (9.5%)
Stroke (6.9%)
Lower respiratory diseases
(6.2%)
Alzheimer's disease (4.8%)
Diabetes (4.0%)
Kidney failure (2.4%)
Liver disease (2.2%)
Homicide (<1%)
Terrorism (<0.001%)|
COVID-19 (2.1%)
Influenza/Pneu
monia (19%6)

Media coverage of these causes of death in 2023 in...
The New York Times
The Washington Post
Fox News
Heart disease (2.8%)
Heart disease (2.9%)
Cancer (4.1%)
Cancer (4.7%)
Accidents (5.9%)
Cancer (3.8%)
Accidents (6.1%)
Accidents (9.7%)
Suicide (4.1%)
Suicide (3.3%)
COVID-19 (6.0%)
COVID-19 (7.9%)
Suicide (3.8%)
COVID-19 (5.3%)
Drug overdose (7.5%)
Drug overdose (9.8%)
Drug overdose (9.5%)
Cancer (26%)
Accidents (9.5%)
Stroke (6.9%)
Lower respiratory diseases
(6.2%)
Alzheimer's disease (4.8%)
Diabetes (4.0%)
Kidney failure (24%)
Suicide (2.1%0)
COVID-19 (2.1%0
Homicide (42%)
Homicide (52%)
Homicide (46%)
Terrorism (18%)
Terrorism (12%)
Terrorism (11%)
Homicide (<1%)
Terrorism (<0.001%)
Note: Based on the share of causes of death in the US and the share of mentions for each of the causes in the New York Times, the Washington Post and Fox News. All values are normalized to 100%, so the shares are relative to all deaths caused by the 12 most common causes + drug overdoses, homicides and terrorism. These causes account for more than 75% of deaths in the US.
A "media mention" is a published article in one of the outlets which mentions the cause (e,g. "influenza) or related keywords (e.g. "fu") least twice.
Data sources: Media mentions from Media Cloud (2025): deaths data from the US CDC (2025) and Global Terrorism Index.|
CC BY
dszlosek.bsky.social
If I'm doing a lot of written work, I would get tired of writing the sigma notation or the triangular numbers you suggested. I see this as an easy and fast way of saving my wrists from cramping!
Reposted by Donald Szlosek
solomonkurz.bsky.social
In Ch 19 (nyu-cdsc.github.io/learningr/as...) of his 2nd edition, Kruschke used *residual* SD as a standardizer for group differences from a multilevel ANCOVA. Is there any precedent for using a *residual* SD as a standardizer for a standardized mean difference effect size? #RStats
nyu-cdsc.github.io