Which kind of peer review is best for catching fraud?

Serge Horbach

Is peer review a good way to weed out problematic papers? And if it is, which kinds of peer review? In a new paper in Scientometrics, Willem Halffman, of Radboud University, and Serge Horbach, of Radboud University and Leiden University, used our database of retractions to try to find out. We asked them several questions about the new work.

Retraction Watch (RW): You write that “journals’ use of peer review to identify fraudulent research is highly contentious.” Can you explain what you mean?

Willem Halffman and Serge Horbach (WH and SH): The precise role of the peer review system has long been discussed. Two expectations of the system are more or less universally accepted: peer review is supposed to help improve the quality of a submitted manuscript and it is expected to distinguish between high and low quality work. However, there are quite a few expectations of the peer review system that are not as widely shared. These include expectations such as granting equal and fair opportunities to all authors (regardless of gender, nationality etc.), providing a hierarchy of the most significant published results, or detecting errors or outright fraud in submitted papers. Some claim that peer review cannot be expected to perform such functions, as it was never designed nor meant to do so. Others point out that the peer review and editorial system are increasingly remodelled to detect fraud, supported by recent developments such as text similarity scanners, image manipulation scanners or the establishment of editorial ‘integrity czars’. In addition, when new cases of misconduct come to light, the peer review system is often blamed for not filtering out the fraudulent research before it could enter the academic literature. Researchers talk about peer review as if we all know precisely what it is and what it is for, but there is actually quite some variation hidden under that general term.

RW: You also write that “In research on scientific integrity, retractions are sometimes used as indicators of misconduct or questionable research practices. However, this requires extreme caution.” Please explain.

Willem Halffman

WH and SH: The reason for this caution is threefold. First, it is commonly known that a substantive share of retractions is not due to misconduct or QRP, but rather is the result of (honest) errors by authors, editors, or publishers. Second, retractions suffer from what criminologists call the ‘dark number issue’: only a small and unknown proportion of articles with (severe) issues gets retracted, while a potentially large proportion of such articles remains untouched. Last, retractions have a dual meaning: they both signal trouble as well as the willingness to address trouble. Hence, not having retractions could both indicate a lack of issues, as well as the unwillingness to tackle them retrospectively. Similarly, a rising number of police arrests can be seen as a sign of rising crime, or of increased police activity, or both.

RW: You write that “Unfortunately, journal web pages present surprisingly incomplete information about their peer review procedures, even for procedures currently in use.” What would you consider best practices in terms of presenting such information?

WH and SH: There are many ways in which journals could be more transparent about their editorial policies and peer review procedures. We believe journals should simply publish this information on their webpages, in sufficient detail. For example, stating that peer review is ‘open’ or ‘double-blind’ is still ambiguous, because these terms could mean many things. Rather, we urge journals to be specific: what does ‘open’ mean in your case? Does it mean reviewer identities are known, or that review reports are freely available? And open to whom exactly? Does ‘blinding’ include the removal of author names from the reference list, or only from the title page? In addition to presenting this information on their journal webpages, journals could even disclose this information on the article level, indicating on a published manuscript how it was reviewed and who was involved in this process.

RW: Tell us about the classifications of peer review that you used.

WH and SH: We used a classification of peer review procedures that we established in our previous work. In that work, we trace the development of different review procedures, including the reason for their establishment. This leads to a classification based on twelve dimensions, including the level of author anonymity, the level of review anonymity, the timing of peer review in the publication process and the use of digital tools such as text similarity or statistics scanners.

RW: Which characteristics were associated with the most retractions? The fewest retractions?

WH and SH: Several review procedures show significant differences in the number of retractions associated with them. Some of the most prominent differences could be found in the level of author anonymity: blinded author identities are associated with significantly fewer retractions as compared to review procedures in which author identities are disclosed. Also, review procedures that use ‘anticipated impact’ or ‘novelty’ as a selection criterion are associated with significantly more retractions, as is review in which no digital tools such as similarity scanners are used. In contrast, journals using plagiarism or statistics scanners are associated with fewer retractions, as are journals using the pre-submission (registered reports) review model .  

RW: What recommendations would you make to editors who want to improve their peer review processes?

WH and SH: Beyond analyzing peer review procedures and their relation with retractions in general, we broke down this relation by research field and reason for retraction. This provides editors with some guidance on how to alter their review procedures in order to tackle issues that are particularly relevant to their journal. For instance, registered reports might be a very prominent way of improving review procedures in some fields, but far less applicable in others. In addition, we acknowledge that decisions on peer review procedures are based on many more factors than preventing retractions alone. We therefore recommend that editors and publishers consider our findings about more effective peer review procedures in the context of their own particular editorial concerns, not as one-size-fits-all rules.

RW: Yours is one of the first studies to use our database of retractions, which we officially launched in October. What would you tell others who might want to make use of it as a tool?

WH and SH: The Retraction Watch database is without equal: getting similar amounts of retractions out of databases such as Web of Science or PubMed is virtually impossible. In addition, information on the reason for retraction is very valuable and not present in other databases. While the data are quite encompassing, researchers using them should be very careful about their interpretation. For example, there is a very real temptation to use this dataset as a list of ‘bad publications’; as a sign of ‘bad behaviour’ that can be related to contextual factors such as countries, institutions, or research profiles. At the same time, retractions are a sign that journals are taking action, of a new way to address problems in the published record. It is a fantastic resource for integrity researchers, but also one that should be interpreted with care.

Like Retraction Watch? You can make a tax-deductible contribution to support our growth, follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up for an email every time there’s a new post (look for the “follow” button at the lower right part of your screen), or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].

One thought on “Which kind of peer review is best for catching fraud?”

  1. 20 December
    Thanks for the interview.
    It depends on the editor and journal. I am reviewing about 50 manuscript a year and reject more than 90 %, whatever the journal. Among the rejected paper about 10 % displays diferent types of misconducts.
    To detect misconducts, I need to get full previous papers of the same research team (to detect self-plagiarism) and eventually primary data (to detect fabricated or falsified data); primary data are important to get original pictures with embedded information (date, apparatus, sample number, institute identiication…) or primary data for complex plots (e.g. XRD profiles, XPS, spectra…..). I was surprised that some editors agree fully and other refuse, writing me that I am the first to make such request.
    Thanks to read my comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.