A lot of tests that were "good enough" for human research (RAT for creativity, Seeing the Mind in The Eyes for empathy) are not robust enough to be benchmarks for AI.
A lot of tests that were "good enough" for human research (RAT for creativity, Seeing the Mind in The Eyes for empathy) are not robust enough to be benchmarks for AI.