October, apparently, is “studies of retractions month.” First there was a groundbreaking study in PNAS, then an NBER working paper, and yesterday PLoS Medicine alerted us to a paper their sister journal, PLoS ONE, published last week, “A Comprehensive Survey of Retracted Articles from the Scholarly Literature.”
The study, by Michael L. Grieneisen and Minghua Zhang, is comprehensive indeed, reaching further back into the literature than others we’ve seen, and also including more disciplines:
We found, 4,449 scholarly publications retracted from 1928–2011. Unlike Math, Physics, Engineering and Social Sciences, the percentages of retractions in Medicine, Life Science and Chemistry exceeded their percentages among Web of Science (WoS) records. Retractions due to alleged publishing misconduct (47%) outnumbered those due to alleged research misconduct (20%) or questionable data/interpretations (42%). This total exceeds 100% since multiple justifications were listed in some retraction notices.
Consistent with other studies, retractions were rare but increasing — and heavily concentrated among certain authors:
Though widespread, only miniscule percentages of publications for individual years, countries, journals, or disciplines have been retracted. Fifteen prolific individuals accounted for more than half of all retractions due to alleged research misconduct, and strongly influenced all retraction characteristics. The number of articles retracted per year increased by a factor of 19.06 from 2001 to 2010, though excluding repeat offenders and adjusting for growth of the published literature decreases it to a factor of 11.36.
Their article is a useful contribution to the literature on retractions and represents a great deal of effort, as I can attest from personal experience. There are a number of respects in which this study and the recent study that I published with Arturo Casadevall and Grant Steen agree. For instance, the rate of retraction has risen in recent years, even after correction for the increasing number of published articles and the contribution of authors with multiple retractions, but the number of retracted papers still represents a very small percentage of all published articles. In addition, the authors have analyzed retractions referenced in databases other than PubMed, which has allowed them to survey additional scientific fields. This has permitted them to conclude that retractions occur across diverse disciplines.
We’d agree that this is a big contribution to the literature about retractions, not least because it gathers a significantly larger dataset than any previous studies. One of the things that we noticed, however, was that the authors relied on the retraction notices themselves:
The information given in retraction notices was taken at face value, and no attempt was made to independently verify the accuracy of the statements made in the notices.
Fang called this “a methodological weakness” — a critique with which we have to agree. Along with the authors’ system of categorization — more on that in a minute — he said it was
…a major factor contributing to their conclusion that “most retracted articles do not contain flawed data, and the authors of most retracted articles have not been accused of research misconduct.” On the basis of our independent analysis, I would dispute these conclusions.
Indeed, in the recent paper Fang published along with Steen and Casadevall, two-thirds of more than 2,000 retractions listed in Medline since 1973 were due to misconduct. That seemed to contradict earlier studies showing — as this new one seems to — that most retractions were due to error. But that’s because Fang et al went beyond the notices themselves, relying on Office of Research Integrity reports, Retraction Watch, and other sources to suss out the real reasons for retraction in opaque notices. As we wrote at the time:
It’s now clear that the reason misconduct seemed to play a smaller role in retractions, according to previous studies, is that so many notices said nothing about why a paper was retracted.
Fang said he found the authors’ categorization system “confusing”:
Specifically, the classification of data falsification or fabrication, plagiarism and intentional duplicate publication as forms of “author error” is confusing, as most studies have characterized these practices as misconduct, as opposed to error. Similarly, lumping together methodological or analytical errors with data irreproducibility results in a category error because, as you know, we found a number of cases in which data irreproducibility was cited for what turned out to be suspected or documented fraud.
Fang also said that the authors’ approach
…weakens the authors’ suggestion that the stigma of misconduct should be ‘de-coupled’ from retraction.
A Chinese proverb is said to state that “the beginning of wisdom is to call things by their right names.” Thus, to understand retractions and to address their underlying causes, it is important not to limit our understanding to the incomplete and sometimes misleading information provided in retraction notices.
We agree, of course. We’ve asked Zhang for comment, and will update when we hear back.
Update, 4 p.m. Eastern, 11/4/12: Zhang responds:
The perceived discrepancies between our PLoS One paper and Fang’s PNAS paper seem to be largely due to semantics and some differences we found in subsequent analyses of PubMed and non-PubMed retractions.
The PNAS article found that among retracted articles in the biomedical literature, 67.4% were retracted due to some form of misconduct (either fraud, plagiarism or duplicate publication). Our PLoS One article found almost exactly the same percentage across all scholarly fields of the literature. However, we divided the category of “misconduct” into two groups, ie., “research misconduct” and “publishing misconduct:. “Research misconduct”, roughly equivalent to Fang’s category of “fraud”, applied to about 20%, while “publishing misconduct”, predominantly plagiarism and duplicate publication, applied to 47% of all the articles in our survey. These two categories add up to about 67% (though there is slight overlap between the two). We divided “misconduct” into these two categories, and introduced the category of “distrust data or interpretations,” to allow us to quantify the total proportion which contain suspect data, whether due to data manipulation or artifacts. Our statement that “Authors of most articles are not accused of research misconduct” is based only on subset of cases which Fang classified as “fraud,” and does not include figures for plagiarism and duplicate publication.
In our follow-up article (now under review) which focuses on all categories of “misconduct”, we have found very different proportions of articles retracted due to (a) fraud versus (b) the sum of plagiarism + duplicate publication between the PubMed and non-PubMed literature. This likely explains the discrepancies between our two studies regarding percentages for each of the 3 sub-categories of “research misconduct” (i.e., fraud), plagiarism and duplicate publication in the biomedical literature (PNAS article) versus all scholarly literature (PLoS One article).
Regarding sources of information on retraction cases, we felt the sometimes ambiguous information in retraction notices collectively provided a “level playing field” for all of the retracted articles, while extra information from RetractionWatch and the ORI cases published in Federal Register only apply to two specific subsets of retracted articles. One subset is the articles retracted from 2010-2012 which would be reclassified based on the information from RetractionWatch skewing the comparisons with data from prior years. Another subset is the articles by USA authors which would be reclassified based on the information from ORI cases skewing comparisons with data for all other countries.
Finally our statement that “Most articles do not have flawed data” is based on the 42% of articles classified as “Questionable data or interpretations” shown in Fig. 3 of our PLoS One paper (see attached). The authors of the PNAS paper mentioned a 15.9% reclassification rate from “error” to “fraud” and give examples where retraction notices mentioned vague “questions” or “irregularities” regarding the data. Since we would have classified such notices under “Distrust data or interpretations”, most of the reclassifications probably would have involved shifts from the “Distrust data or interpretations” to “Fraudulent/fabricated data” categories in Fig. 3; so it might have had little effect on the 42% total for “Questionable data or interpretations” that we found across all scholarly literature.
Zhang also suggested that we add “research” in front of “misconduct” in this post’s title to more accurately reflect the findings, which we’ve done.
We also note that while we see the distinction Zhang and her co-author are drawing between “research misconduct” and “publishing misconduct,” the Council of Science Editors makes no such distinction:
Research misconduct generally falls into one of the following areas: mistreatment of research subjects; falsification and fabrication of data; piracy and plagiarism. . . Plagiarism generally involves the use of materials from others but can apply to researchers’ duplication of their own previously published reports without acknowledgment.