A paper published in August that caught the media’s eye for concluding that feeling sad influences how you see colors has been retracted, after the authors identified problems that undermined their findings.
The authors explain the problems in a detailed retraction note released today by Psychological Science. They note that they found sadness influenced how people see blues and yellows but not reds and greens, but they needed to compare those findings to each other in order to prove the validity of the conclusion. And once they performed that additional test, the conclusion no longer held up.
In the retraction note for “Sadness impairs color perception,” the editor reinforces that there was no foul play:
Although I believe it is already clear, I would like to add an explicit statement that this retraction is entirely due to honest mistakes on the part of the authors.
The paper gathered attention from media outlets, such as NPR and the Huffington Post. During the experiment, first author Christopher Thorstenson, a graduate student at the University of Rochester in New York, asked people to watch either a funny video clip or a sad one (where, in the Lion King, a cub watches his father die). Afterwards, students completed a task designed to test their color perception, and those who had just watched the sad video had a harder time seeing colors on the blue-yellow axis than those who watched a comedy clip. They repeated the experiment after people watched either the sad clip or one that was neutral, and saw the same result.
However, right after the paper appeared, commenters on PubPeer began questioning its conclusions, calling attention to the problem the authors detail in their retraction note. (An even more detailed analysis appears on Areshenk_blog.)
The PubPeer discussion includes some statistical analyses that may not be easy to follow for everyone, so here’s a recap, courtesy of Mind Hacks:
The flaw, anonymous comments suggest, is that a difference between the two types of colour perception is claimed, but this isn’t actually tested by the paper – instead it shows that mood significantly affects blue-yellow perception, but does not significantly affect red-green perception. If there is enough evidence that one effect is significant, but not enough evidence for the second being significant, that doesn’t mean that the two effects are different from each other. Analogously, if you can prove that one suspect was present at a crime scene, but can’t prove the other was, that doesn’t mean that you have proved that the two suspects were in different places.
Indeed, as the authors explain in the retraction notice, once they performed that comparison, the results no longer held:
…this comparative inference required testing the statistical significance of the difference between the BY and RG effects, and subsequent analyses indicated that this difference was not significant (z = 1.58, p = .114, for Experiment 1 and z = 1.07, p = .285, for Experiment 2). Thus, we should not have concluded that our pattern of findings ruled out a motivational explanation.
This exact experimental oversight occurs all too often, according to a 2011 paper in Nature Neuroscience, which found that the same number of papers performed the procedure incorrectly as did it correctly.
Still, given how thoroughly the authors explained the problems with the article, and how quickly they acted on them (the retraction note appeared two months after the paper was published), we are designating this a case of “doing the right thing.”
Jelte M. Wicherts, an Associate Professor of Methodology and Statistics at Tilburg University who was not involved in the research, also had praise for the authors:
The authors’ choice to retract is very brave and should be an example to many others who are proved wrong by re-analyses and post-publication peer review yet stubbornly cling onto their earlier statements. This is a nice example of how quickly science can correct itself when the data are openly shared, because the reanalysis of the data that were put on the open science framework enabled others to quickly note that the authors had made a conclusion about a crucial interaction that was clearly not supported by the data.
I first learned of the incorrect statistical interpretation from acquaintances who referred me to Andrew Gelman’s blog posting about it. I am very keen to reduce the rate of such errors in Psychological Science, and to do what I can to enhance the journal’s reputation for rigour, so I was highly motivated to put things right as expeditiously as possible. The very active blogosphere response may well have added to the sense of urgency I felt. It also helped a lot that the authors quickly came to agree that the article should be retracted and that they drafted the retraction. And of course the staff of the APS Publications Office are superbly skilled.
Lindsay said he’s following up with an article about improving the statistics reported in the journal:
I have an in-press editorial titled Replication in Psychological Science, which is part of a continued effort to increase the statistical rigour of works published in Psychological Science. My understanding is that it will be published online on Monday.
We’ve also contacted Thorstenson and last author Andrew Elliot.
Update 11/5/15 2:25 p.m. eastern: We’ve heard from Nick Brown, a PhD student at the University of Groningen who has published several articles questioning the validity of published research (and appeared on our site). He told us he agreed with the retraction:
Retracting the article seems to be the correct decision, given the issues that the authors themselves have reported….It’s certainly unusual for an article to be retracted before it even makes it to the paper version of the journal. It takes considerable courage to own up to the limitations in your own work.
He added that he was part of the post-publication discussion surrounding the article, including on PubPeer:
A small group of people were tweeting about what we thought were some implausible aspects of the article. We got together via e-mail to look at the article and dataset in more detail. The number of identical responses on the blue-yellow article in Experiment 2 stood out a mile. I also took part in the discussion on PubPeer, although as far as I know none of the others with whom I’ve been looking at the article did so.
We also asked him: Should the reviewers of this article have picked up on the problems that others identified so readily?
…there are some other methodological issues in the article that the reviewers should arguably have picked up on; for example, why was there no baseline measure of the participants’ colour perception before they watched the film clips? The second reason given for retraction was the unusual frequency distribution of the responses. That can’t be detected in the article as published, because it does not contain a table of descriptive statistics; had the skewness of the blue-yellow axis results in Experiment 2 been given, the problem would have been visible immediately. If the authors did not provide the descriptives, the reviewers should arguably have asked for them, as it’s a very basic element of almost any empirical article.
Update 11/5/15 4:16 p.m. eastern: We heard from Thorstenson, who sent us responses on behalf of all the coauthors. Like the journal editor, they first realized the problems from Andrew Gelman’s blog, and chose to act quickly:
We discovered the error of interpretation from a posting on Andrew Gelman’s blog. He had written an article on this topic, and pointed out that our article made the error of interpretation that he had written about. We discovered that a considerable number of participants provided the same response for all blue-yellow judgments when the journal editor brought it to our attention. As for speed in acting, we simply responded to the situation as soon as we learned about it. We initially requested a correction (i.e. Corrigendum) from the editor. After discussion with the editor, we decided on a retraction instead, so that we could redo Experiment 2 before seeking publication of our findings.
They also were not surprised that peer reviewers missed the problems:
The first error is actually quite common (see Nieuwenhuis, Forstmann, & Wagenmakers, 2011, Nature Neuroscience), so perhaps it is not surprising that the action editor and reviewers didn’t catch our oversight. The other problem only becomes clear when one breaks down the descriptive statistics, and the action editor and reviewers did not have this information. This is our oversight alone – we should have done a more thorough analysis of the descriptive statistics before moving on to the inferential statistics.
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, and sign up on our homepage for an email every time there’s a new post. Click here to review our Comments Policy.