Journals are failing to address duplication in the literature, says a new study

Mario Malički

How seriously are journals taking duplicated work that they publish? That was the question Mario Malički and colleagues set out to answer six years ago. And last month, they published their findings in Biochemia Medica.

The upshot? Journals have a lot of work to do.

Since we’re often asked why duplication is a problem, we’ll quote from the new paper: Duplication “can inflate an author’s or journal’s prestige, but wastes time and resources of readers, peer reviewers, and publishers. Duplication of data can also lead to biased estimates of efficacy or safety of treatments and products in meta-analyses of health interventions, as the same data which is calculated twice exaggerates the accuracy of the analysis, and leaves an impression that more patients were involved in testing a drug. Not referencing the origin or the overlap of the data, can therefore be considered akin to fabrication, as it implies the data or information is new, when in fact it is not.”

We asked Malički to answer a few questions about the new study.

Retraction Watch (RW): You note that MEDLINE, part of the U.S. National Library of Medicine, tags duplicate articles “whenever substantial overlap between two or more articles is discovered during indexing, irrespective of an authorized notification.” That seems unusual among indexers, yes?

Mario Malički (MM): Yes, to the best of our knowledge, NLM indexers are the only one doing this among all the large bibliographic databases. NLM has confirmed this practice to us, but not the specifics surrounding it (e.g. are indexers trained to do this, is more than one indexer needed before such a tag goes online, or are there any other control mechanism when an indexer tags articles as duplicates). As our results indicated, the indexers made mistakes in 35% of cases; nevertheless, we fully support the NLM practice, as thanks to their tagging we were both able to conduct the study and engage editors in correcting the records. It would be ideal if they would inform the journals when an article is tagged, but this goes beyond their practice.  

RW: This study took five years. Explain the various steps.

MM: There are several reasons why it took this long. Frist, this was always a side project I was doing alongside my PhD studies. Second, after I presented our initial results at the 2013 International Congress on Peer Review in Biomedical Publication in Chicago, we were recommended to contact the editors, as well as to explore the citation rates of the duplicates and originals, which we initially did not plan to do. We also met with NLM at the conference and shared the cases we thought were their mistakes, most of which they corrected until end of 2014. So, we contacted the editors in 2015, send reminders, and then waited for 2 years to see whether they published notices or retracted the duplications. As we discovered that some tags were errors of the NLM, we then decided, for all those cases we did not get a response from journals, to obtain the full text of the articles and compare them manually – to confirm if indeed they were duplicates or not. In quite a few cases, it turned out they were not. I then got my postdoc position in Amsterdam, so the final check and manuscript update was done in 2018.

RW: You differentiate duplications that were “due to the publishers’ actions, most commonly publication of the same article in two different issues of the same journal,” from those “occurring due to authors’ actions…most commonly due to submission of the same manuscript to two different journals.” Were the latter cases likely due to publisher error?

MM: No, they were cases when authors misused the publishers’ trust and did not inform them of publications they already had or submitted. We used the term “authors actions” rather than misconduct because self-plagiarism is not everywhere defined legally as misconduct. But we definitely see it as detrimental research practice.

RW: What were your main findings?

MM: Our main finding is that duplicate publications are not addressed – in our study only 54 % (n=194 of 359) of duplicates were addressed by journals and only 9% (n=33) retracted, although they should all have been retracted according to editorial standards (e.g. COPE). My personal impression is that duplicate publications are of low interest to the publishers and the media. They are not as “exciting” as cases of fabrication, falsification or plagiarism of data or ideas, but rather an indication of the flaws in the system – the inability to detect that such a publication already exists. If I compare duplicate publications to simultaneous publications of practice guidelines in several journals, which is common practice, it seems very easy to correct duplications – instead of retracting them – publish a notice or second version which clearly states this publication was first published elsewhere. The interesting question is, for those duplicates that were the results of authors intentionally submitting them to two or more different journals – did these authors use those “different” publications to boost their CVs or to obtain grants or funding? In 57% of cases of duplications due to authors actions, there were changes in the number or order of authors, so it is likely the authors had specific gains in mind – but this was not something we could investigate in this study.

RW: What would you conclude from these findings, about MEDLINE duplication notices, and about journals’ willingness to take action about potential duplications?

MM: I would like to thank both the NLM and the editors/journals that responded to our queries and did something to correct the records. Unfortunately, the process is too slow. Everyone seems to be unprepared to deal with these issues, and after notifications, investigations took too long – as if it is really hard work to compare two texts and see if they are the same. As for the MEDLINE notices, I applaud NLM for what they tried to do, and hope their next step will be even bolder – to include highly visible descriptions tags in search engines (perhaps even as an addition to the titles of papers) and when citations are exported in any format – to clearly alert users that they are dealing with papers that are duplicates. Such a thing would be welcome, not just for duplications, but also for retractions, and possibly even for simultaneous publications.

Like Retraction Watch? You can make a tax-deductible contribution to support our growth, follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up for an email every time there’s a new post (look for the “follow” button at the lower right part of your screen), or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at team@retractionwatch.com.

6 thoughts on “Journals are failing to address duplication in the literature, says a new study”

  1. I tip my hat to Mario and his co-authors for carrying out and publishing this important study.

    Given that there are widely different views regarding various aspects of duplication (e.g., percent of text that can be recycled, conditions under which data/images may be reused) and that different types of duplication (i.e., covert duplication of data) are much more serious than others (i.e., duplication of text), all the stakeholders need to be on the same page, particularly with respect to any reuse of data. As such, let me repeat a message that, in my view, should become a sort of mantra for all to internalize: The provenance of data, regardless of their form (e.g., numerical, images) must always be crystal clear; it should never come into question.

  2. There’s something else that might be contributing factor, which I learned of only last week. During the reporting of a series of duplication problems (see http://www.psblab.org/?p=606), I contacted ORI, and was told the following in their response…

    However, re-use of images and data does not meet the definition of research misconduct. Per 42 C.F.R. Part 93.103:

    Research misconduct means fabrication, falsification, or plagiarism in proposing, performing, or reviewing research, or in reporting research results.

    Falsification is manipulating research materials, equipment or processes such that the research is not accurately represented in the research record.

    In each instance of re-use, the research is accurately represented in the research record; thus there is no falsification. The re-use is consistent with self-plagiarism, which also does not meet ORI’s definition of plagiarism. Thus ORI does not have jurisdiction.

    TL/DR: Self plagiarism is not misconduct!

    Any (US) journal retracting on the basis of self-plagiarism would presumably have to base such actions on copyright infringement, and if that were accompanied by an implication that misconduct had occured, this would potentially render a journal liable for lawsuits from allegedly defamed authors. Without a strong ORI mandate to lean back on, journals may be reticent to get into legal problems such as this.

  3. I interpret ORI’s response, particular the segment “In each instance of re-use, the research is accurately represented in the research record” be interpreted to mean that a reader familiar with the research described in the papers containing the duplication would reasonably recognize the reused data and images as not original/new. I am not in the biomedical field, but even if the duplication is not obvious to a casual reader, I have to wonder whether those who are familiar with this line of work and are willing to read these papers would agree with ORI’s position.

    1. M. Roig: This is not an accurate interpretation of ORI’s position. There is absolutely no insinuation that the reader should/would recognize these images as reused. I have no idea if it is the case in this specific instance, but if images were only reused to report the same findings repeatedly, it would not (on it’s own) amount to factual misrepresentation of the research record. In many other cases where image duplication has occurred, it has been obvious that the same images cannot represent the different findings being reported (the lanes are re-labeled, etc), and that *would* implicate misconduct. This is the only distinction ORI is making by this statement — if the duplicated images are accurate representations of the same research findings in every instance, it would only be self-plagiarism — an obviously improper practice, but unfortunately not befitting the statutory definition of “research misconduct.”

      Self-plagiarism not fitting in the misconduct regs is “Research Integrity 101” stuff. ORI barely has the resources and staff to pursue their mandate as it is — although self-plagiarism is extremely poor practice and I, like Dr. Brookes wish there was more ORI could do in these cases, I just don’t see how it’d would be possible under the current circumstances.

  4. Dear Dr. Jondoe, I appreciate the clarification. I guess a key issue for me is the extent to which instances of reuse make clear that reuse is taking place. I suppose in most such cases a careful examination of the data/images in the relevant papers by experts in the field can establish the actual status of the research record. But, as sometimes happens, (see, for example, the reference to an instance of ‘personal confusion’ by an editor this RW post: https://retractionwatch.com/2018/11/19/ketamine-for-depression-paper-retracted-for-error-that-double-counted-clinical-trial-participants/), the actual status of the scientific record may be obfuscated by how the reuse is presented to readers, especially when even experts are potentially mislead.

  5. I think anyone who is disgusted with the current state of affairs with regard to fraud in biomedical research, or for that matter, academic research in general, should watch Peter Thiel’s video on “The Reasons for the Decline of Western Civilization and Science”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.