Although it’s the right thing to do, it’s never easy to admit error — particularly when you’re an extremely high-profile scientist whose work is being dissected publicly. So while it’s not a retraction, we thought this was worth noting: A Nobel Prize-winning researcher has admitted on a blog that he relied on weak studies in a chapter of his bestselling book.
The blog — by Ulrich Schimmack, Moritz Heene, and Kamini Kesavan — critiqued the citations included in a book by Daniel Kahneman, a psychologist whose research has illuminated our understanding of how humans form judgments and make decisions and earned him half of the 2002 Nobel Prize in Economics.
According to the Schimmack et al blog,
…readers of his [Kahneman’s] book “Thinking Fast and Slow” should not consider the presented studies as scientific evidence that subtle cues in their environment can have strong effects on their behavior outside their awareness.
Remarkably, Kahneman took the time to post a detailed response to the blog, writing:
What the blog gets absolutely right is that I placed too much faith in underpowered studies. As pointed out in the blog, and earlier by Andrew Gelman, there is a special irony in my mistake because the first paper that Amos Tversky and I published was about the belief in the “law of small numbers,” which allows researchers to trust the results of underpowered studies with unreasonably small samples. We also cited Overall (1969) for showing “that the prevalence of studies deficient in statistical power is not only wasteful but actually pernicious: it results in a large proportion of invalid rejections of the null hypothesis among published results.” Our article was written in 1969 and published in 1971, but I failed to internalize its message.
We contacted Kahneman, who confirmed that he indeed posted this response on the blog Replicability-Index. It’s commendable that someone of his stature would take the time to thoughtfully acknowledge and respond to criticisms of his work in such a transparent way. (It’s earned him praise from Columbia statistician Andrew Gelman, as well.)
The blog — which will make the most sense to people versed in statistics — is about Kahneman’s citations of research on “priming,” in which the memory of something can unconsciously influence a person’s behavior going forward.
As the authors note in their blog about Kahneman’s work:
In the beginning of 2012, Doyen and colleagues published a failure to replicate a prominent study by John Bargh that was featured in Daniel Kahneman’s book. A few month later, Daniel Kahneman distanced himself from Bargh’s research in an open email addressed to John Bargh.
So the researchers decided to rate the studies Kahneman cites according to their power, and a measure known as the “R-index,” described here. As Schimmack, Heene, and Kesavan write in their blog:
To correct for the inflation in power, the R-Index uses the inflation rate. For example, if all studies are significant and average power is 75%, the inflation rate is 25% points. The R-Index subtracts the inflation rate from average power. So, with 100% significant results and average observed power of 75%, the R-Index is 50% (75% – 25% = 50%). The R-Index is not a direct estimate of true power. It is actually a conservative estimate of true power if the R-Index is below 50%. Thus, an R-Index below 50% suggests that a significant result was obtained only by capitalizing on chance, although it is difficult to quantify by how much.
The results are eye-opening and jaw-dropping. The chapter cites 12 articles and 11 of the 12 articles have an R-Index below 50. The combined analysis of 31 studies reported in the 12 articles shows 100% significant results with average (median) observed power of 57% and an inflation rate of 43%. The R-Index is 14.
The argument is inescapable: Studies that are underpowered for the detection of plausible effects must occasionally return non-significant results even when the research hypothesis is true – the absence of these results is evidence that something is amiss in the published record. Furthermore, the existence of a substantial file-drawer effect undermines the two main tools that psychologists use to accumulate evidence for a broad hypotheses: meta-analysis and conceptual replication. Clearly, the experimental evidence for the ideas I presented in that chapter was significantly weaker than I believed when I wrote it. This was simply an error: I knew all I needed to know to moderate my enthusiasm for the surprising and elegant findings that I cited, but I did not think it through.
I am still attached to every study that I cited, and have not unbelieved them, to use Daniel Gilbert’s phrase. I would be happy to see each of them replicated in a large sample. The lesson I have learned, however, is that authors who review a field should be wary of using memorable results of underpowered studies as evidence for their claims.
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.