What happens to copies of retracted papers on non-publisher websites (eg PubMed Central)?

One of the important questions when it comes to retractions is, what happens to retracted papers? How do readers find out they’re retracted? There’s evidence they are cited less often, but that when they are cited, the vast majority of the time it’s as if they were never retracted.

So with all of that in mind, Phil Davis, an independent researcher and publishing consultant, set out to find

…Internet copies of 1,779 retracted articles identified in MEDLINE, published between 1973 and 2010, excluding the publishers’ website. Found copies were classified by article version and location. Mendeley (a bibliographic software) was searched for copies residing in personal libraries.

Many of the 321 copies of 289 retracted articles Davis found were on PubMedCentral (PMC), the “free full-text archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health’s National Library of Medicine.” Here are the highlights of the results, which Davis published last month in the Journal of the Medical Library Association:

Just over one-quarter (26% or 82) of retracted articles located in this study contained some retraction statement. Sixty-six of these 82 copies (80%) were accessible from the page-view in PMC—a format that provides access to an article 1 page at a time within a larger web page, which contains bibliographic data. Removing these and focusing on full PDF files of retracted articles, just 16 (5%) of the articles contained some form of retraction statement.

Davis could only see the bibliographic information in Mendeley, not the whole PDF, so he didn’t speculate as to whether those were marked.

In other words, just 5% of the full PDF files contained a retraction statement. (Minor note: The paper’s abstract includes a typo when summarizing these results, of 15 for 16. When we pointed this out to Davis, he immediately contacted the publisher, who will issue a correction in the journal’s October issue. But it doesn’t affect the results, and is just a typo. Kudos to Davis for taking care of it so quickly.)

Here’s how Davis summarized the paper for Retraction Watch:

Taken together, the results claim:

  1. That copies of retracted papers are widely available, and easily discoverable, from non-publisher websites, that
  2. Few of these copies indicate that the papers were subsequently retracted, and that
  3. References of these papers are widely found in the libraries of a popular reference manager

We asked Davis why he focused, in the paper’s abstract, on the 5% figure, rather than the 26%. The former, of course, paints an even more bleak picture of what’s on non-publisher websites. He responded:

Consider the PMC page-view rendering of a retracted article.

As a reader, I have to display each of the 7 pages individually and I can’t download the entire article or print it off as a single article. The rendering of the text is not good, I can’t copy and paste relevant statements, and it is hard for me to read this article from my screen. I think most readers would find this an unacceptable version to do more than give a cursory view. As a reader, I want the full PDF file of the article.

The PMC page-view display of the article also provides an opportunity to display a retraction message in the metadata. In the above example, the statement “This article has been retracted” is found just above the title information in the bib record. However, you will note that there is no indication on the PDF file itself that the article has been retracted. Had I found a copy of the full PDF on a publicly accessible website, I would miss any indication that this article was retracted.

So, to answer your question, I feel that an unmarked full PDF of the publisher’s version poses the biggest threat to potential readers and as such, focused on these.

Fair enough. Regardless of which number is the better one to use, they’re both low. But we should note that publishers aren’t exactly perfect when it comes to notifying readers, either. As Grant Steen found in a 2010 study:

Journals often fail to alert the naïve reader; 31.8% of retracted papers were not noted as retracted in any way.

Davis’s work was “funded by the Publishers International Linking Association, which oversees the operation of CrossMark,” one potential solution to the awareness problem. As we’ve written elsewhere, CrossMark is a

…clickable logo that will let a reader know whether there have been any corrections, retractions or other revisions. It is a solution to the fact that such changes are at best difficult to find — and are sometimes not mentioned at all on ‘current’ versions of papers.

Theoretically, that means authors will avoid citing retracted papers positively — or at least know when they’re citing such studies.

But Davis, who offers some ways to prevent unknowingly citing retracted papers, notes in the study:

As CrossMark is unable to push alerts to readers or replace older PDF files on user machines with current copies, it is likely that retracted papers—especially older papers published without the CrossMark symbol—will be cited for some time.

August 22nd, 2012

August 22nd, 2012 at 1:50 pm

  • John Mashey August 22, 2012 at 3:29 pm

    Crossmark looks useful, but given how easy it is to use Acrobat to at least add a “retracted” header to every page of a PDF, it’s too bad people don’t do that.

  • Geraint Duck August 23, 2012 at 6:03 am

    I’d be interested to know how such retractions (particularly those in PMC/MEDLINE) affect text-mining applications. Do these retracted papers remain in the applicable open-access subsets (BMC, etc.)? If so, current and future text-mining studies will be automatically mining these studies for information which could be wrong, and already stated as such.

    Any one have any experience or links to ways to automatically associate retractions to their papers during any text-mining applications?

    • Gråsten August 23, 2012 at 10:22 am

      You should not be solely dependent on text-mining for your analyses – simple as that…

  • Angela August 23, 2012 at 4:33 pm

    When at a librarian panel, they were talking about how important it is for publishers to allow authors to post manuscript versions of the final peer-reviewed papers in their open access library depositories. They said that they will try to link to the final published version but they don’t usually get the needed information from the authors. I asked what happens if a paper is corrected or retracted? Do they note this with the posting or remove the paper from the repository? Of course not! They have no way of keeping track of that information. You cannot demand to be in control if you do not take the responsibility for that which you are controlling.

  • Dan Eisenberg August 26, 2012 at 11:01 am

    I just requested that EndNote add a feature to their software to scan for retractions (

