High-profile paper that used AI to identify suicide risk from brain scans retracted for flawed methods

In 2017, a paper published in Nature Human Behavior made international headlines for the authors’ claim they had developed a way to analyze brain scans using machine learning to identify youth at risk for suicide.

“It was a big, splashy finding,” said Timothy Verstynen, an associate professor of psychology at Carnegie Mellon University in Pittsburgh, who was not involved in the research. But at a neuroimaging conference soon after the publication, other researchers discussed the study “in kind of a sense of disbelief,” he said.

The 91% accuracy for identifying suicidality that the researchers reported, from a sample size of just a few dozen participants, he said, “kind of went against what we as a field were starting to understand about the nature of these brain phenotype markers based off of neuroimaging data.”

After six years of scrutiny, during which Verstynen attempted to replicate the work but found a key problem, the authors of the 2017 paper have retracted the article.

The paper, “Machine learning of neural representations of suicide and emotion concepts identifies suicidal youth,” has been cited 134 times, according to Clarivate’s Web of Science. Altmetric, which tracks the online attention publications receive, places the article in the top 5% of all the articles and other content it has tracked.

The retraction notice, published today, stated:

The authors are retracting this article after concerns were raised about the validity of their machine learning method in a Matters Arising. While revising their response to these concerns, the authors confirmed that their method was indeed flawed, which affects the conclusions of the article. Specifically, the stepwise classification method used in the article overestimated the classification accuracy of who is a suicidal ideator because the features of the classifier were tuned to that particular dataset. The authors aim to demonstrate the predictive value of machine learning applied to fMRI data for the classification of suicidal ideators using new data and analyses in an independent future publication. All authors agree to this retraction.

Marcel Adam Just, the first and corresponding author of the paper and also a psychology professor at Carnegie Mellon, has not responded to our request for comment.

While the retraction notice was published today, after being made available to the press under embargo, earlier this week the paper’s title changed to include “RETRACTED ARTICLE,” with a link on the article page to a non-working retraction notice. (This sort of thing has happened periodically for years, suggesting problems with publishers’ production systems.) We asked the publisher if we still had to wait to report the retraction, and a spokesperson for Nature Human Behavior told us:

Owing to an administrative error, the paper was updated to list the retraction prior to the retraction notice being published. This is being corrected and the retraction remains under embargo.

Researchers began airing questions about the study and its small sample size on Twitter soon after it was published. The Twitter account for the journal responded:

Absolutely. We encourage the submission of a Registered Report that replicates the study.
— Nature Human Behaviour (@NatureHumBehav) November 1, 2017

A 2018 letter to the editor of Nature Human Behavior questioned whether the results would be generalizable “across culturally diverse populations,” and pointed out a potential for bias in the methods. The authors’ response defended their work and stated that a larger trial to confirm the findings was underway.

In 2019, Simon Eickhoff, director of the Institute of Systems Neuroscience at Heinrich Heine University in Düsseldorf, in Germany, tweeted that the paper was “extremely poor science”:

Highly dangerous paper: "Machine learning […] identifies suicidal youth", "Establishes a biological, neurocognitive basis"

Based on 17 suicidal ideators vs 17 controls. There is no way this generalizes! @Neuro_Skeptic @WiringTheBrain @GaelVaroquaux https://t.co/8S0opzl8sV pic.twitter.com/BkTKZVEfjs
— Simon Eickhoff (@INM7_ISN) August 1, 2019

Nature Human Behavior’s Twitter account replied:

1/2 We published this work following thorough peer review because it provided important proof of concept for a potentially valuable method on a question of substantial clinical relevance.
— Nature Human Behaviour (@NatureHumBehav) August 1, 2019

Eickhoff and some colleagues published a Matters Arising in 2021 that formalized and expanded on his critiques of the sample size and the way the authors developed their machine learning model. The authors’ response pushed back on the criticisms, while acknowledging the limitations of the sample size, and again stated that they “look forward to sharing the forthcoming results from our ongoing larger study.”

Meanwhile, in 2020, Verstynen was looking for examples of prediction models to use for his students in his data science course. He remembered the 2017 paper and thought it would be worth diving into.

Verstynen mentioned the idea to another researcher, Konrad Paul Kording of the University of Pennsylvania, who would later become his co-author on the new Matters Arising mentioned in the retraction notice. The two scientists talked about how the paper “seems like either an absolutely brilliant methodological feat, or there’s something possibly wrong with how the classification was done,” Verstynen said.

Kording pointed out that the authors had made their code and some of their process data available following earlier criticism, and Verstynen dug in.

He reread the paper and noticed an “inconsistency” in how the authors selected the features to go into their model. “The way it was described made it seem like there was a possibility of information leakage happening in the development of the model, where information from the holdout set was leaking into the cross validation,” he said.

Information leakage, meaning that the data used to train an algorithm contains some of the information the algorithm is supposed to learn to predict, is a common problem in machine learning, and can result in the algorithm not performing as well in the real world as in tests.

Kording also tried replicating the work, but couldn’t get the same results.

“Initially I thought I was doing it wrong,” he said. But after a month of playing with the code, eventually he thought there might be something wrong with the paper’s methods.

In April 2020, he emailed the paper’s authors about his difficulty replicating their work, and his concern about information leakage in their model. They thanked him, he said, but he didn’t hear anything further.

Verstynen and Kording next reached out to Nature Human Behavior with their concerns, and at the editor’s recommendation submitted their Matters Arising in the summer of 2020. The authors submitted their reply, and both went through peer review.

After Verstynen and Kording got their reviews and made minor revisions, they didn’t hear anything for a while. Eventually, after they pushed the journal’s editor for a response this January, 2.5 years after submission, they learned that their Matters Arising would be published.

But instead of the authors’ reply being published with it, the original paper would be retracted.

Eickhoff, the scientist who published the 2021 Matters Arising, told us he saw “no fundamental difference” between his concerns and the ones Verstynen and Kording articulated.

Both commentaries, and the possibility of retraction, were under consideration at the journal at the same time, he said. The last he heard was that the article would stand along with the two critiques. He told us he was “perplexed” by the timing of the retraction. “I personally feel that keeping the article online – together with the matters arising – is preferable over a retraction.”

Verstynen didn’t start off trying to get the paper retracted, but simply was seeking clarification about the methods, he said. Because of the important topic, as well as the media attention and follow-up clinical trial the paper garnered, “it seemed warranted to really push to have this addressed.”

Eickhoff and Verstynen both say that the field has evolved in the last six years to be more robust, so the retraction won’t have much effect on current methods. The paper probably couldn’t get published today, Verstynen said, but there are “probably a lot of studies” from the same time period that have similar problems.

Verstynen said he “can’t fault the journal” for following its process to reevaluate the paper, but more openness about what was going on would be helpful for the research community. “Given how long it took the process to play out, we’re talking half the lifespan of this paper was spent with this issue known to the journal.”

The national 988 Suicide and Crisis Lifeline “provides 24/7, free and confidential support for people in distress, prevention and crisis resources for you or your loved ones, and best practices for professionals in the United States.”

Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].

One thought on “High-profile paper that used AI to identify suicide risk from brain scans retracted for flawed methods”

Matilda Muravyova says:

April 6, 2023 at 6:12 pm

How did this get published in the first place? An undergrad in an elementary machine learning class could spot that overfit.

High-profile paper that used AI to identify suicide risk from brain scans retracted for flawed methods

Related

One thought on “High-profile paper that used AI to identify suicide risk from brain scans retracted for flawed methods”

Leave a ReplyCancel reply

Share this:

Related

One thought on “High-profile paper that used AI to identify suicide risk from brain scans retracted for flawed methods”

Leave a ReplyCancel reply