Vilgot Huhn
banner
vilgothuhn.bsky.social
Vilgot Huhn
@vilgothuhn.bsky.social
Confused PhD student in psychology at Karolinska Institutet, Stockholm. GAD, ICBT, mechanisms of change. Organizing the ReproducibiliTea JC at KI.
Website: https://vilgot-huhn.github.io/mywebsite/
Personal blog at unconfusion.substack.com
Yes, that’s why I made it even. I just added the mean out of dumb curiosity and thought it looked quirky

bsky.app/profile/vilg...
The point of the simulation was to look at how the median p is equal to the alpha level (the red and blue overlapping lines) at 50% power. I just added the mean out of curiosity.
November 26, 2025 at 5:40 AM
It's not exact so there's probably nothing actually interesting here though.
November 25, 2025 at 7:57 PM
The point of the simulation was to look at how the median p is equal to the alpha level (the red and blue overlapping lines) at 50% power. I just added the mean out of curiosity.
November 25, 2025 at 7:57 PM
By all means eyeball the scatterplot. Never skip eyeballing the scatterplot. A lot of scatterplots are visibly weird in ways that means OLS is unsuitable. That’s not what I’m talking about here.
November 25, 2025 at 5:59 PM
this one's for you @bureaucracynow.bsky.social
November 25, 2025 at 2:27 PM
Thanks for engaging. I am however landing in the conclusion that the paradox is not itself a justification for interpreting p=0.048 as evidence *for* the null, at least not under normal circumstances. Relatedly, I dug up a response from lakens re that idea on the other site:

x.com/lakens/statu...
x.com
November 25, 2025 at 12:30 PM
My interpretation of your statement here is that even in a well designed study with decent power, I should interpret p=0.048 as evidence that there is nothing there. Is that a correct reading of what you mean?

bsky.app/profile/stea...
But with decent power, p=0.048 is evidence for the null regardless of how well it's designed. (Also in this case it's a 3-way interaction...)
November 23, 2025 at 7:28 PM
But when one levels this critique that seems to entail specifying an H1 larger than the observed effect, right? (I’m thinking the observed effect implies the observed p-value) so then there has to be some basis for doing that.
November 23, 2025 at 7:25 PM
In that statement, does H1 refer to an assumed true effect, with an associated power (conditional on n, study design, etc)?
Or does H1 refer to "true effect does not equal zero"?
If it's the second one, I'm learning something new and important cause that is not how I've thought of the paradox.
November 23, 2025 at 6:06 PM
Yes, as I wrote earlier in the thread, ideally you justify your threshold (but this is usually not done in a thought through way). But the thing I'm stuck on here is interpreting p=0.048 as evidence *for* H = 0. I'm not sure I see how those two issues connect.
November 23, 2025 at 5:11 PM
Thanks! I'm trying to be humble as I'm just a PhD-student, but honestly I find it a bit hard to reconcile "being interested in estimating a precise meta-analytic SMD" with "not reacting when one of the SMDs is impossible". (maybe they outsourced the data entry to someone else, and then didn't check)
November 23, 2025 at 1:21 PM
Maybe I should again clarify that I do agree that the case you present here is suspicious, mostly since there are several p-values in that close-to-threshold area at once.
November 23, 2025 at 12:32 PM
I think it makes sense to ask what sort of effects we care about and whether the test is overpowered for the smallest effect of interest, in that case it makes sense to me to react to a "close to threshold p-value". Also, your point about the p picking up on model violations is 👍👍 6/6
November 23, 2025 at 12:32 PM
that's different. The scenario @esolomon.bsky.social stipulated was "a well-motivated, well-controlled study". In that scenario I don't think one should circle the p-value for being too close to the threshold. That's removing its role as a threshold. Am I missing something here? 5/6
November 23, 2025 at 12:32 PM
The second thing is what power means. AFAIK "power" only exists for a given effect. The paradox happens when we look at a thin bin, and the test is high power (e.g. the true effect is large or n is large or combos of these). But if we don't know the true effect and ask simply whether H ≠ 0, 4/6
November 23, 2025 at 12:32 PM
As far as I've understood, N-P frequentism is ideally about justifying a threshold (which is usually not done to be fair) and then treating that threshold as a threshold. 3/6
November 23, 2025 at 12:32 PM
I seem to have drawn different conclusions about the practical implications.

I think a few things are happening here that might be junctures to different perspectives.
The first one is how to categorize a p=0.048. Does it belong to the bin 0.04-0.05 or in the bin 0-0.05? 2/6
November 23, 2025 at 12:32 PM
Thank you for your reply!
I think this is a very conceptually important debate, so I hope it's ok if I'm being a bit persistent about it. I've read Daniel's writings on Lindley's paradox before and while I found it very interesting, 1/6
November 23, 2025 at 12:32 PM
To be clear, I'm not talking about this particular oxytocin claim, which I would wager is not a thing.
November 22, 2025 at 11:22 PM
Isn't this pushing it too far? With a good design, if the null happens to be true, you would see values below the threshold rarely. If there was an effect, they'd be more common. How does that turn into evidence for the null (as opposed to non-null, not as opposed to a specific alternative effect) ?
November 22, 2025 at 11:21 PM