Ready to geek out on retraction data? Read this new preprint

thomson reutersThere’s a new paper about retractions, and it’s chock-full of the kind of data that we love to geek out on. Enjoy.

The new paper, “A Multi-dimensional Investigation of the Effects of Publication Retraction on Scholarly Impact,” appears on the preprint server arXiv — meaning it has yet to be peer-reviewed — and is co-authored by Xin Shuai and five other employees of Thomson Reuters. Highlights from their dataset:

  • Medical or biological related research fields tend to have the highest retraction rates.
  • Retracted papers are cited more often — a median of eight times — than the average article (a median of once).
  • The median time from publication to retraction is two years.
  • About half of all retractions are due to misconduct, including plagiarism.
  • Retracted papers, and their authors, are cited less often after retraction.
  • Institutions involved in retractions tend to be cited more often, but “the reputation of those institutions that sponsored the scholars who were accused of scientific misconduct did not seem to be tarnished at all.”
  • Authors of papers retracted for fabrication or falsification see the largest dip in citations, with the “decrease is even more pronounced when the retraction cases are exposed to the public by media.”
  • “[R]etraction rate in one topic hardly affects its future popularity.”

The authors did “yeoman’s work” in coding more than 1,600 retraction notices for criteria such as reason and who requested the retraction, says MIT PhD student  Joshua Krieger, who’s investigated similar issues. (One critique we’d raise is that the new paper relies completely on retraction notices for the reasons for retraction, which — as Retraction Watch readers know — will skew our understanding of trends in why papers are pulled, since not all notices are equally forthcoming about why a paper was retracted. But perhaps that could be a next iteration; we’d be the first to acknowledge just how much work this would be.)

Krieger co-authored a working paper last year that found that authors who initiate their own retractions face no citation penalty as a result, suggesting the community rewards those who do the right thing. Krieger added:

It’s great to see more work that takes a careful and systematic approach to measuring the scientific and reputation fallout from retraction events! The paper really embraces the view that science is cumulative (“standing on the shoulders…”) and that retractions have potential implications for the careers and productivity of scientists with different types of connections to the retracted article. This view seems most useful in thinking about policies regarding retractions, scientific misconduct, and more generally, reproducibility.

The authors have done yeoman’s work in coding over 1,600 retractions for their retraction reasons and source (e.g. editor, author’s request). They put together an impressive analysis data set of retraction author’s career histories and institutions, as well as scientific topics. They manage to merge more standard bibliometric data on publications and citations with a clever application of text analysis and supervised machine learning. They also do a nice job in applying the Lu et al. (2013) method for using pre-retraction citation paths to select control papers/authors for comparisons.

Their findings that retracted authors and papers suffer reduced citation impact (relative to controls) is in line with the other studies on this topic.

Although the paper is “similar to earlier work on many dimensions,” said Northwestern’s Benjamin Jones, one of the co-authors of the paper co-authored by Krieger, the authors “go after a broader set of outcomes in one place and with quite comprehensive data.” One intriguing finding: Institutions with retractions have higher citations. But, added Jones:

Their finding on institutions may well be another way of observing that more highly-cited papers are more likely to be retracted.  Here, it would be that more highly-cited papers tend to come from more highly-cited authors who tend to be at more highly-cited institutions.

The authors suggest that studies are being retracted faster than in the past, which, they write, may be due “to the development of digital libraries and online publishing that facilitate and accelerate scholarly communication.” A previous study found that it took, on average, about three years for papers to be retracted; the new study found an median of two years.

We also ran the study by the University of Washington’s Ferric Fang, a member of our parent organization’s board of directors who has collaborated on a number of studies of retractions:

With the exception of the reliance on retraction notices, the analysis in this paper appears to be generally sound.  The results are corroborative of earlier observations, which suggests that they are likely to be valid.  The findings that retractions result in a decline in citation rates, particularly when misconduct is involved, is a good sign that the system is generally operating as it should.

That finding is, broadly speaking, similar to that of previous studies. Fang continued:

The most original aspect of the study is the determination of the effects of retraction on authors’ institutions and fields.  The results in this regard were essentially negative.  Again, this is hardly surprising, given that institutions and scientific fields are much larger than any individual, and even a high profile retraction would be anticipated to have a negligible effect on citation counts for entire institutions and fields.

However, this a view from 35,000 feet and should not be taken to mean that science is so robust that retractions don’t adversely impact individual research areas or institutions.  If one drills down to examine specialized sub-fields, the impact of retractions may be seen.  For example, the retraction of a large number of Joachim Boldt’s publications had an impact large enough to alter the conclusions of a meta-analysis on volume resuscitation (Zarychanski et al. JAMA 309:678, 2013).  As another example, the number of papers relating to XMRV fell off sharply following the retraction of Mikovits’ 2009 Science paper in 2011, and I am certain that the citation impact of the field declined as well.  The reason, of course, is that the importance of the virus was diminished by the recognition that it is not involved in the pathogenesis of chronic fatigue syndrome.

Furthermore, citation productivity is not the only measure of institutional impact.  Can one truly say that ‘sponsoring research institutions. . . are not negatively impacted by retraction?’  After the retraction of Obokata’s Nature papers on STAP, Riken cut the Center for Developmental Biology’s funding by 40% and closed many of its labs.  Those researchers, most of whom had no direct involvement in the STAP scandal, would beg to differ.

The authors of the new paper conclude:

A fundamental, yet controversial, question that remains with regards to paper retraction is: As the number of retraction incidences keeps increasing, is it a good or bad signal for the development of science? Some scholars may claim that the drastic increase in retractions suggests the prevalence of scientific misconduct which disobeys the principle of doing science and may harm the authority and activity of scientific research. Others may claim that paper retraction is just a normal mechanism of self-examination and self-correction inherent to the scientific community, and that the increasing rate of retraction indicates the enhancement of that mechanism, which actually benefits scientific development in the long run. Even though we cannot give definite preference to either opinion, our study shows that the increasing retraction cases do not shake the “shoulders of giants”. Only those papers and scholars that are directly involved are shown to be impacted negatively by retractions. In contrast, the sponsoring research institutions, other related but innocent papers and scholars, and research topics are not negatively impacted by retraction. Therefore, from our perspective, while the phenomenon of retraction is worth the attention of academia, its scope of negative influence should not be overestimated.

There are tons more interesting data in the full paper, so read the whole thing here.

Hat tip: Rolf Degen

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our new daily digest. Click here to review our Comments Policy.


9 thoughts on “Ready to geek out on retraction data? Read this new preprint”

  1. In the Disclaimer, to be included is how many retractions is connected to his or her name along with payments from drug companies as a possible source of conflict of interest.

  2. Indeed, this is a very good topic to talk about. As a matter of fact, I talked about this years ago. A Google search can easily find some of my past talking (publications) on this topic, such as:
    Top journals’ top retraction rates – ResearchGate'_top_retraction_rates
    Retracted Papers: How to Curtail Their Impact?
    Comment on the Correspondence by Cokol et al

  3. May I also suggest that the authors standardize the listing of their names in a standardized way to avoid being incorrectly cited. Except for the first author, all the remaining authors are listed in the order first name + family name. Only the first author is listed as family name + first name (a Chinese custom). To avoid being cited incorrectly by those who maybe have less cultural perception of names, as Shuai X et al., I suggest to the authors that they swap the order of the first author’s names so that the correct citation will be Xin S et al.

  4. Suggest to the authors to use an editing service of native English speakers. Some authors have a hard time distinguishing between family names and personal names of Western names.

  5. I would argue that the data in Figure 1c, used to support the conclusion of a downward trend in the time delay between publication and retraction, are flawed because the trend is a self-fulfilling prophecy!

    Because the data are from 2014, it’s a physical impossibility for there to be papers in the 2010 cohort with a retraction delay of 4+ years, no papers from 2011 with a delay of 3+ years, no papers from 2012 with a delay of 2+ years, etc. Visually, the red line on this annotated version of their figure shows the upper boundary for retraction delay:

    I would argue that there are probably papers out there in those more recent years that will take a long time to get retracted, if given sufficient time. As such it’s premature to conclude a downward trend because there’s a sampling error – data from more recent years is biased toward short delays.

    A safer approach might have been to limit the study take the overall average delay from the full data set (looks to be about 8 years) and cut off the study 8 years earlier, in 2006. The problem is, doing so would likely nullify the conclusion of a downward trend.

      1. Paul,
        Your caveat is well taken, but I suspect that the authors are in fact correct that retractions are occurring more quickly. Grant Steen, Arturo Casadevall and I published an analysis a few years ago that showed a progressive decline in the time-to-retraction since 2000. We found that among 714 retracted articles that were published between 1973-2002, retraction required an average of 50 months to retract, whereas among 1,333 retracted articles that were published after 2002, retraction required only 24 months. It is also important to analyze time-to-retraction by cause of retraction, as papers retracted for misconduct take longer to retract. Overall our model estimated that about half of retracted papers are retracted within 20 months of publication (Steen et al. PLoS One, 8:e68397, 2013).

  6. Well spotted Paul

    What the authors really need to do here is a survival analysis or two. Their stated aims in the paper relate to “the rate of publication retractions” and survival analysis methods are designed to assess such rates.

    The event of interest is retraction.

    The outcome of interest is the hazard rate for time to retraction. The authors may want separate survival models by year or decade of publication and so on.

    Looking at the hazard rate over the publishing years will show how the various rates are changing.

    Papers not yet retracted will be censored at their maximum follow-up time.

    Reviewers should reject this paper for publication as it stands, and encourage the authors to consult with a knowledgeable survival analyst to set up models that will address their stated aims appropriately. Setting up the risk sets as papers appear over time to match their stated aims will be a bit tricky but doing it properly will take care of the time bias spotted by Paul Brookes.

    Kudos to the authors for posting to arXiv, they will be able to receive valuable peer review and improve the paper before submission.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.