A journal has issued an expression of concern for a 17-year-old paper by one of the world’s most prominent behavioral psychologists after it partly failed a statistical stress test conducted by a group that has been trying to reproduce findings in the field.
The 2004 article, by Dan Ariely, of Duke University but then at MIT, and James Heyman, then a PhD student at the University of California, Berkeley, was published in Psychological Science. Titled “Effort for Payment: A Tale of Two Markets,” the article looked the relationship between labor and payment for that work:
The standard model of labor is one in which individuals trade their time and energy in return for monetary rewards. Building on Fiske’s relational theory (1992), we propose that there are two types of markets that determine relationships between effort and payment: monetary and social. We hypothesize that monetary markets are highly sensitive to the magnitude of compensation, whereas social markets are not. This perspective can shed light on the well-established observation that people sometimes expend more effort in exchange for no payment (a social market) than they expend when they receive low payment (a monetary market). Three experiments support these ideas. The experimental evidence also demonstrates that mixed markets (markets that include aspects of both social and monetary markets) more closely resemble monetary than social markets.
But when a team of students led by Gilad Feldman, a psychologist at the University of Hong Kong, used statcheck to assess the paper — which has been cited more than 400 times, according to Clarivate Analytics’ Web of Science — they found problems.
According to Feldman, the replication analysis was one of roughly 100 that he and his students (as well as some early-career researchers) have been conducting over the past three years:
we have been doing this with many classics in our field. The targets are chosen by me, as a tribute to a work I feel has been important and influential in my domains of interest. I choose targets for replication as a tribute to this work, in directions that I think are promising. Targets are not chosen because of any concerns I had, quite the contrary.
In this case, he said, the results were mixed:
I feel it is important to begin by saying that for the most part we have successfully replicated the findings by Heyman and Ariely (2004) Study 1, twice, with different samples, using different designs. The main difference in our results was that regarding one analysis we were able to “detect” an effect where none was expected, which we thought required a slight reframing of the theory/phenomenon. This difference was probably due to our samples being much better powered than the original. During that process, we identified some issues with the reporting of the statistics in the original article, yet proceeded with the replications regardless with some needed adjustments for the uncertainty (bigger samples).
Feldman said his team submitted their replication study to Psychological Science, which rejected the paper because it wasn’t a Registered Report. But Patricia Bauer, the editor-in-chief of the journal:
was concerned by the inconsistencies we identified and proceeded to follow up with the original authors. She later contacted me to inform us that the journal has decided to issue an expression of concern for this article.
Feldman provides more details here. According to the notice:
The corresponding author of the article and coauthor of this statement, Dan Ariely, attempted to locate the original data in an effort to resolve the ambiguities but was unsuccessful. Because the ambiguities cannot be resolved, we decided to issue an Expression of Concern about the confidence that can be held in the results reported in the article.
The notice cites:
13 “errors” or mismatches. Five of the discrepancies were associated with Experiment 1 of the article, four with Experiment 2, and four with Experiment 3. The values reported in the article and those resulting from the statcheck recalculation are provided in Table 1. In seven of the instances of discrepancy between the reported and recalculated values, the p value of the test changed, but the status of the test—as statistically significant or not statistically significant—remained the same. That is, the test was reported as statistically significant and remained so or was reported as nonsignificant and remained so. Thus, these discrepancies do not impact the interpretation or conclusions. In six of the instances of discrepancy, the outcome of the statistical test was different, resulting in what is known as a “decision error.” In these instances, there is a mismatch between results reported as significant and the results of the recalculations, which indicate that the tests are not statistically significant. These discrepancies, if the recalculated values are correct, would change the interpretation of the data in a manner that substantively alters the conclusions drawn from the research.
The notice also cites a “a lack of specificity regarding the analytic approach adopted in the article,” which, it states, could help explain the findings in the statcheck rerun.
The EoC, co-written by Ariely and Bauer — whose journal isn’t afraid to scrutinize old publications — concludes that:
Given the ambiguities, the confidence we place in the conclusions drawn from the research is diminished. However, again, given the ambiguities, the Editor in Chief decided not to change the official publication record of the article through a Corrigendum. Instead, the corresponding author and editor are issuing this Expression of Concern and note that the differences between the values reported in the published article and the values recalculated through statcheck, and the lack of specificity regarding the analytic approach, undermine confidence in these data and the conclusions drawn from them.
Ariely responded to our request for comment in an audio file, in which he said he was “delighted” with the replication effort by Feldman’s group. He acknowledged using a “strange” statistical analysis with a software program that no longer works, and said:
It’s a good thing for science to put a question mark [on] this.
Ariely added that he wished the journal would have published Feldman’s replication analysis, along with the EoC:
Most of all, I wish I kept records of what statistical analysis I did.
He added:
That’s the biggest fault of my own, that I just don’t keep enough records of what I do.
Feldman praised the students who have been helping him on the replication project:
Students and early-career researchers are the key to the “credibility revolution” and they are our most underappreciated underutilized stakeholder. We have shown that undergraduates, as early as 2nd year, can help us reexamine our literature and conduct high-quality replications and extensions that meet the standards of the best journals in our field. Students repeatedly describe this experience as meaningful and they are enthusiastic about being part of actual hands-on science process that contributes back to the community.
And Ariely said:
This is the way that science should progress.
Like Retraction Watch? You can make a one-time tax-deductible contribution or a monthly tax-deductible donation to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].
A different paper by the same author: Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end.
See analysis at datacolada.org/98
Noted in yesterday’s RWD: https://mailchi.mp/retractionwatch/the-rw-daily-biotech-faked-data-ori-ariely-data-colada-sci-hub-retractions