tj ♏️ahr
@tjmahr.com
2.3K followers 950 following 4K posts
data scientist studying how kids learn to speak, dad, jump roper, bayesian, tjmahr.com
Posts Media Videos Starter Packs
tjmahr.com
after writing out a working parser in R, i asked chatgpt to translate the tokenizer to cpp11 and got a 3x speed increase. (of course, i reviewed the output and have the two parsers unit testing against each other)
tjmahr.com
what if Elsa kept hearing the four-note shorthand for death all movie? uh-huh
tjmahr.com
it’s easier than that because it’s just a data-summary problem on posterior draws too
jordannafa.bsky.social
We just sample from the posterior predictive distribution under each of the potential outcomes and calculate arbitrary quantities by integrating over their contrasts. It's remarkably more straightforward than any flow chart "test" selection nonsense 🤷‍♂️
Reposted by tj ♏️ahr
reshetz.bsky.social
20 hours still no electricity + almost set myself on fire with a candle
Reposted by tj ♏️ahr
reichlinmelnick.bsky.social
My first thought was “wow I hope this guy didn’t give out his name because he could absolutely be charged” and yup, he is correct to stay anonymous.

This man’s account is shocking. His neighbors’ situation is shocking. The entire Chicago raid is shocking.
On the other side was a mom and her 7-year-old daughter, pleading for his help.
"I wasn't planning on letting her stay, but I didn't know what the hell was going on," the man said of his Venezuelan migrant neighbors. But he quickly relented. The little girl was inconsolable and hid under his bed.
"I didn't want them to take her," said the man, who didn't want to be named because he fears he'll be targeted by federal authorities for his actions.
"I gave her my bedroom, and I just told her,
'Just stay there. Don't open, don't, shh, just stay quiet," he recalled telling the mom and daughter as he choked back tears.
Reposted by tj ♏️ahr
emilymoin.com
I am open to the idea that there are people who don't have and don't want to gain the skills to engage directly with their data but every single day that I do I learn the answer to a question you'd never even think to ask unless you were personally staring into the abyss of an uncleaned dataset.
tjmahr.com
i should have thrown in some exclamation marks in the first post in this thread because this update closes a 5-year-old issue and that makes me happy github.com/tjmahr/readt...
tjmahr.com
thanks to @jofrhwld.bsky.social for writing the first version of the parser. i overhauled it after making a pathologically bad textgrid that praat.exe could still open and then making the parser handle that evil file github.com/tjmahr/readt...
tjmahr.com
A YouTube GitHub tutorial showed how to make a pull request by making an edit to a popular JS repo. youtu.be/YFkeOBqfQBw?...
2 years later its still happening
YouTube video by ThePrimeTime
youtu.be
tjmahr.com
i got another performance boost by using `substring(text, token_start, i - 1)` to pull out substrings rather than paste characters together. but then the tokenizer couldn't handle characters with diacritics correctly and i had to revert it
tjmahr.com
tokenizer grew strings character by character on each loop `cur_value <- c(cur_value, s)`. sped up a 35 ms function up by 6 ms by instead tracking the starting index of the current token and then putting the sequence together when needed
`cur_value <- all_char[seq(token_start, i - 1)]`
tjmahr.com
I get it! This was a major version change and a big overhaul. This is when stuff can break. For my old plots, I just need to figure out whatever theme() calls I need to just set the font size, line height, and whatever else base_size touched before.
tjmahr.com
yeah, I end up simulating 1000 new participants on each posterior draw using the random effect covariance matrix to get a population mean on each draw. It used to be scary but posterior::rvar made it super straightforward
mjskay.com
no you just:

1. fit a big ol' complicated model that represents as much of the data generating process as you can, then

2. marginalize all that shit into the trash and report means
tjmahr.com
tjmahr.com
and there it is. pointsize scales with font size

patchwork::wrap_plots(
ggplot(mtcars) +
geom_point(aes(wt, mpg)) +
theme_grey(),
ggplot(mtcars) +
geom_point(aes(wt, mpg)) +
theme_grey(19)
)
the same plot but with different text and therefore point size
tjmahr.com
you’re allowed to do breaking changes on major versions. this change just didn’t stick out to me in the announcement blogpost
tjmahr.com
it does kinda suck that every pre-4.0.0 ggplot2 plot that set theme(base_size) will look different now
Reposted by tj ♏️ahr
junoryleejournalism.com
David Simon, creator of ‘The Wire’, being interviewed by Ari Shapiro (NPR)
SHAPIRO: OK, so you've spent your career creating television without Al, and I could imagine today you thinking, boy, I wish I had had that tool to solve those thorny problems...
SIMON: What?
SHAPIRO: ...Or saying...
SIMON: You imagine that?
SHAPIRO: ...Boy, if that had existed, it would have screwed me over.
SIMON: I don't think Al can remotely challenge what writers do at a fundamentally creative level.
SHAPIRO: But if you're trying to transition from scene five to scene six, and you're stuck with that transition, you could imagine plugging that portion of the script into an Al and say, give me 10 ideas for how to transition this.
SIMON: I'd rather put a gun in my mouth.
tjmahr.com
for reference,

- old default point size was 1.5
- old default linewidth was .5
Reposted by tj ♏️ahr
tjmahr.com
and there it is. pointsize scales with font size

patchwork::wrap_plots(
ggplot(mtcars) +
geom_point(aes(wt, mpg)) +
theme_grey(),
ggplot(mtcars) +
geom_point(aes(wt, mpg)) +
theme_grey(19)
)
the same plot but with different text and therefore point size
tjmahr.com
but i am used to scaling point size by shrinking the size of the plotting area. ggsave() with smaller height and width caused shapes to be bigger. now my points are massive.
tjmahr.com
and there it is. pointsize scales with font size

patchwork::wrap_plots(
ggplot(mtcars) +
geom_point(aes(wt, mpg)) +
theme_grey(),
ggplot(mtcars) +
geom_point(aes(wt, mpg)) +
theme_grey(19)
)
the same plot but with different text and therefore point size