Alon Zivony
banner
alonzivony.bsky.social
Alon Zivony
@alonzivony.bsky.social
Lecturer of Psychology, studying visual attention (with EEG) and prejudice against LGBTQ+ @ University of Sheffield (he/him)

Visit the Sheffield PandA Lab:
https://sites.google.com/sheffield.ac.uk/panda-lab/home
I missed academic memes on my feed!
September 23, 2025 at 9:43 AM
yeah, same. Up until now I just used repeated measures. But recently I've been venturing to individual differences and between group comparisons. I only now realise that things are much murkier than I knew.
September 18, 2025 at 7:10 PM
I just think that we're not trained on thinking what reliability does to our between-group effect sizes. Measurement error, yes. Reliability, no.
September 18, 2025 at 6:27 PM
Exactly. Only that low reliability essentially massively increases within-group variability.

If a measure is unreliable and you happen to find very low within-group variability, the low variability might have happened accidently. So whoever tries to replicate will probably get a different outcome.
September 18, 2025 at 6:14 PM
4. To make matters worse: large effects are often not very reliable. For example, the Stroop effect is easy to find, but the size of the effect changes wildly from one measurement to the next. So any between-group comparison of Stroop requires a large sample size to be replicable.
September 18, 2025 at 5:53 PM
3. So the same rationale from correlations apply.

If our DV is highly unreliable, then effect sizes are small by default because low reliability=noise. The results of our between-group comparison are just not very replicable.

The only solution is huge sample sizes to allow for smaller effects.
September 18, 2025 at 5:44 PM
2. Unless we're manipulating the groups, a between-groups test is essentially a correlation. Think about age groups. Instead of correalting a measure with age as a continious measure, we just nuch together people of different ages to a a single group.
September 18, 2025 at 5:44 PM
Maybe this will help for intution:
1. If we have no reliability (test-retest r = 0), that means that any correlation we're finding is not replicable. After all, if a measure is not correlated with itself, how can it correlate with any other measure?
Weak reliability = less replicable correlation
September 18, 2025 at 5:44 PM
Can you share what you wrote?

This is what I found from a quick search (I can't find if the second was published):
pmc.ncbi.nlm.nih.gov/articles/PMC...
core.ac.uk/download/pdf...
core.ac.uk
September 18, 2025 at 5:28 PM
okay, I saw some not well-cited papers. But my question is: why not correct for reliability? And why not do it often? We are essentially wasting away power by not tracking how (un)reliable our DVs are and adjusting our error terms for it.
September 18, 2025 at 3:34 PM
I always love to help students understand what are the barebones assumptions of psychology as empirical science. That there's an objective reality, even for abstract concepts, and that our goal is to uncover it using methods and statistics. I find it helps with the "is psychology a science" debate.
August 19, 2025 at 12:42 PM
This might seem of no consequence whatsoever, but... if you're trying to use Excel to calibrate timestamps from two different machines to the millisecond, being half a second off can be a big deal!

Excel, who asked you to round up the seconds???
April 10, 2025 at 9:45 PM