When is it acceptable to use some of the same data in separate papers?

Duplication — sometimes referred to “self-plagiarism,” with a lack of precision — is a frequent cause of retractions. Usually, it’s of text that authors have used elsewhere. But what about data? In our new LabTimes column, we describe a hypothetical situation:

A research group conducts a study of the effects of treatment Y on disease X in 18 patients and publishes their findings in the Journal of Medical Plausibility. A year later, they submit a second paper looking at treatment Y and disease X, this time in 27 patients – the initial set plus nine more. In both cases, the scientists report similar rates of efficacy and side effects.

Question: Have they committed publishing misconduct?

You can read more of our thoughts at the original column, including about a real-life case of this we’ve covered, and we look forward to those of Retraction Watch readers.

Written by Ivan Oransky

July 11th, 2012 at 11:00 am

  • Prashanth July 11, 2012 at 11:12 am

    In your example, I am presuming the argument “against” publishing the second study could be:

    (1) No significant increase in sample size(?)
    (2) Nothing new/novel
    (3) No addition to the body of knowledge

    None of these are for me reasons to think of this as misconduct. Of course, we can all “judge” the authors of (ab)using the system to increase their publications, but in my opinion good arguments could be made out for this. Hell, it could be argued that such studies are needed even!….or am I missing something here?

    • Fernado Pessoa July 11, 2012 at 11:42 am

      In reply to Prashanth July 11, 2012 at 11:12 am

      “Hell, it could be argued that such studies are needed even!….or am I missing something here?”

      Studies by others, yes, but not by the same authors, to confirm or refute a finding, but not endless updates which do not change the meaning by the same authors.

      So you may be missing something.

  • jaspevacek July 11, 2012 at 11:12 am

    The report should be quite clear that the original 18 patients were already report on. I would hope that unless significantly new findings are found, the reviewers would suggest not publishing as the results are routine and not worthy of publication.

    If this situation is plagiarism, then aren’t most meta-analysis efforts also plagiarism? The data has already been reported,

    • Andrew Miller July 11, 2012 at 12:19 pm

      I agree with this point. Editor should be made aware of the study context especially previously published papers, ideally in a cover letter. Editor then makes informed decision.

    • Paul A. Thompson July 11, 2012 at 1:32 pm

      This is the relevant point. If IF IF the paper HONESTLY and CORRECTLY indicates that 67% of the cases in Paper 2 were already published in Paper 1, the reviewer can then make a fully informed decision. If the paper does not report this, it is self-plagarism.

      Reviewers have a responsibility to not allow certain things to be published.

  • failuretoreplicant July 11, 2012 at 11:26 am

    Unless the authors acknowledge in one or both papers that they are using the same sample or mostly the same sample, this appears to be fraud.

    Think about type 1 error: Consider a situation in which just by random chance, people assigned to the drug treatment rather than the placebo treatment were going to improve more despite the drug having no effect. Ten outcome measures are tested, and all show improvement in the drug group because of this. If each outcome measure were published in a separate paper, then the evidence in favor of the drug would appear overwhelming. However, in reality, all can be explained by a single type 1 error. The drug may have absolutely no effectiveness at all.

    All outcome measures tested should be reported, and when the authors choose to publish different measures in different papers, this should be explicitly stated in each paper.

    • failuretoreplicant July 11, 2012 at 11:35 am

      I guess I was thinking more about the effect of treatment Y on disease X and disease Z, but just adding more participants inflates Type I error too. Not a good thing to do statistically, and not a good thing to do ethically as well.

  • RW July 11, 2012 at 11:31 am

    I definitely wouldn’t consider that any kind of misconduct, but were I to referee such a paper I wouldn’t recommend publishing it. In my field at least, the check lists journals ask you to fill out when submitting a referee’s report always ask if the paper is significant enough to warrant publication. If it draws the same conclusions as previous studies and offers nothing new, I’d say it would not be significant enough to warrant publication.

    • Marco July 11, 2012 at 11:41 am

      I guess “significance” depends on the situation: suppose this relates to a disease that is very rare? In that case increasing the sample size from 18 to 27 may definitely be worth considering.

      Of course, and perhaps more important, do we not have an *obligation* to publish this expansion of the sample size? If somebody does a meta-analysis, he would not be able to include all available data. It does mean that the efforts to recruit these extra patients must be significant, or we get hundreds of publications going from 1 to 2 to 3 to 4 to 5 to 6 to 7 (repeat ad infinitum) patients.

  • Andy C. July 11, 2012 at 12:05 pm

    What happens when you copy-and-paste text from a previous paper
    (by the same group) and produce a second paper two years later?

    To be specific: The paper

    “Influence of colloid suspensions of humic acids on the
    alkaline hydrolysis of N-methyl-N-nitroso-p-toluene sulfonamide”,
    International Journal of Chemical Kinetics
    Volume 42, Issue 5, pages 316-322, May 2010
    By G. Astray et al.

    is suspiciously similar to the paper

    “Influence of colloid suspensions of humic acids upon the
    alkaline fading of carbocations”
    Journal of Physical Organic Chemistry
    Volume 21, Issue 7-8, pages 555-560, July – August 2008
    By M. Arias-Estevez et al.

    Both papers have been written by the research team of Prof.
    Juan Carlos Mejuto of the University of Vigo in Spain (he is
    coauthor in both), who is now well-known since a plagiarism
    affair involving his team was uncovered in May 2011 by the
    leading Spanish newspaper “El Pais”:

    The German press (Frankfurter Allgemeine Zeitung and Der Spiegel)
    has extensively reported on this story.

    For some reason, the editor of “International Journal of Chemical Kinetics”
    does not consider it to be self-plagiarism, but at least the introduction is

    Could it be that retracting this paper would bring discredit to the journal
    (because of the sloppiness of the refereeing process)? Aren’t there any
    objective criteria for this? Does not exist a “data bank” of systematic

    Could Retraction Watch please address this issue? Thanks.

    • Marco July 11, 2012 at 1:19 pm

      Andy C., I strongly recommend you contact the Editor of JPOC if you have good evidence of plagiarism and feel strongly about the topic (note: I did not check your findings).

      The Editor of JPOC, Luis Echegoyen, is no light-weight (even trying to become president of ACS), and has some experience with the topic of scientific misconduct through NSF. You can expect an honest and fair evaluation from him. And remember, it is *his* journal that has seen its article copied and thus potentially stands to ‘lose’ citations.

      If it is only the introduction that is substantially plagiarized, I think the best you can expect is that they send a warning. It’s the same publisher, so there are no legal concerns.

      • Andy C. July 11, 2012 at 2:01 pm

        Marco, thanks for your comment and advise. It is not a case of plagiarism because it is the same group (albeit not exactly the same authors), but it strongly smells to self-plagiarism, and the group has a record of plagiarized papers (at least two, already withdrawn from the Journal of Chemical and Engineering Data in January 2011). Yes, I strongly feel about the topic because [1] the first author of the two retracted papers (and co-author of the two papers mentioned in my previous post) received his PhD just one week after the leading Spanish newspaper El País extensively published on the plagiarism affair (20 May 2011). [2] The same guy was suggested by a panel of the University of Vigo for an excellence award for his “outstanding” PhD thesis in 2011. And [3] last but not least, the leader of the copy-and-paste group (Prof. Juan Carlos Mejuto) received in September 2011 an award for his “excellence in research” amounting up to Eur. 112.000 from the regional government of Galicia (Spain). The politician ultimately responsible for this award (Jesús Vázquez) was in turn Dean of the School of Business at the same campus of the UVigo (in Ourense), where in 2010 another case of plagiarism happened (this time it was a report, not a paper).

        In countries such as Germany and Hungary the press was putting pressure on the Minister of Defence (Germany) and on the President of the country (Hungary) because they had plagiarized parts of their PhD theses many years ago.They finally resigned because of the pressure of the media. In Spain –the country which invented “picaresca” (ie, cheating as a philosophy of life)– plagiarism is not only tolerated… it is rewarded! This is what burns me, and the scientific community in Spain seems not to care about it.

      • Marco July 11, 2012 at 3:03 pm

        Andy C., you will need to have more than “smells like”, but then self-plagiarism is still unacceptable. Kindly pointing to previous retractions may make Echegoyen (even) more willing to look at your evidence, and pressure the other journal. But you need a good case, meaning you may have to pull the two articles through a plagiarism checker. The “compare” function in Adobe Acrobat doesn’t work, unfortunately.

    • chirality July 12, 2012 at 4:05 am

      “but at least the introduction is copy-and-pasted.”
      Obviously, English is a foreign language for Spaniards, so it must be really difficult for them to write a totally new introduction every time they write a paper on a subject similar to the one they have already covered in some previous publication. Assuming that the data presented in the second paper are original, I would not call this a case of scientific fraud. This is simply a copyright issue. This might be a big deal for the publisher of the original paper but for the wider scientific community the identical introductions in two research papers are of no consequence. If you publish a series of papers on the same subject, writing introductions is like describing the same cat over and over again. Even Joseph Conrad would have run out of original sentences eventfully.

      • Jon Beckmann July 14, 2012 at 11:26 am

        It’s not only difficult for Spaniards, in my experience… After you spend a substantial amount of time to write a good introduction, it is difficult for anyone to come up with a different version that is as good or better. And if it is not as good or better, then why write a new one? It’s silly. You guys are falling for something that is passed as “ethics”, but in reality is only a little con set up by publishers to convince US to protect THEIR interests.

      • David Hardman July 14, 2012 at 12:33 pm

        In reply to chirality July 12, 2012 at 4:05 am

        I thought that the topic was
        “When is it acceptable to use some of the same data in separate papers?”.
        Introductions do not count as data.

        I do not see how “Even Joseph Conrad would have run out of original sentences eventfully.” is relevant.
        He did not run out of original sentences. It did not happen. His books seem to be telling different stories, obviously from his own mind, so there is some kind of unity to them.

        “If you publish a series of papers on the same subject” is the issue. After a while there may not be much more to learn, it stops being science. The sooner people realize this the better. Some never do.

  • Linda July 11, 2012 at 12:43 pm

    It’s self-plagiarism to cut and paste from the first article.
    Re the data, it depends on whether they told the second journal about the first paper. If they didn’t, that’s not being straight.

  • Tim D. Smith (@biotimylated) July 12, 2012 at 1:07 am

    A similar story appeared this morning in my RSS feed from Mary Ann Liebert’s Tissue Engineering Part A — a paper pulled for repeating data from three other publications with the same first author:

  • Fernando Pessoa July 12, 2012 at 4:06 am

    In reply to Tim D. Smith (@biotimylated) July 12, 2012 at 1:07 am

    Already covered here:

    Will this create a “time-loop”?

  • Jon Beckmann July 12, 2012 at 9:43 am

    When you have a large dataset and each paper focuses on different aspects of it, then it’s appropriate. Using the same dataset to test different analysis methods is fine as well. Reanalyzing data because of new hypotheses is also acceptable to me.

    • Rafa July 12, 2012 at 4:44 pm

      Well I have seen people using this excuse, however with comic results. If one cares to check, in the series of examples below, similar data are seen: 1) first pair of studies – similar data generating the same conclusions; 2) second pair of studies – identical data generating different conclusions.



      2a) Gomes, Leonardo ; Gomes, Guilherme ; Oliveira, Helena G. ; Von Zuben, Claudio J. ; Silva, Iracema M. da ; Sanches, Marcos R. ; GOMES, L. . Efeito do tipo de substrato para pupação na dispersão larval pós-alimentar de Chrysomya albiceps (Diptera, Calliphoridae). Iheringia. Série Zoologia, v. 97, p. 239-242, 2007.

      3a) GOMES, L. ; ZUBEN, C . Postfeeding radial dispersal in larvae of (Diptera: Calliphoridae): implications for forensic entomology. Forensic Science International, v. 155, p. 61-64, 2005.

      Amazing. And yet, never retracted.

      • Jon Beckmann July 14, 2012 at 5:21 am

        If they cite their previous work, say that they used data from the previous study, and there is something new in the paper, I see no problem. BTW, is that Zuben or Von Zuben?

      • Rafa July 14, 2012 at 12:40 pm

        Beckman, agreed in general terms. However:

        In the first pair of papers one doesnt cite the other and narrative is identical on different results (meaning same conclusions in verbis republished). The second pair is an absurd case, in which the exact same results are presented with completely different conclusions.

      • Rafa July 14, 2012 at 7:48 pm

        BTW full name of the responsible person for this (and much more) is Claudio Jose Von Zuben, director at UNESP university. Some cite him von Zuben, others Zuben. I do not know — and probably he doesn’t care — which is the correct form.

  • Marc July 14, 2012 at 12:26 pm

    These two papers… bizarre… almost exactly the same… some rework, copy-pasting etc, but the same;

    – Clin Dev Immunol. 2012;2012:397648. Epub 2012 Mar 15.
    Immunotherapy using dendritic cells against multiple myeloma: how to improve?
    Nguyen-Pham TN, Lee YK, Kim HJ, Lee JJ.
    Received November 4, 2011; Accepted January 2, 2012. Published online 2012 March 15

    – Korean J Hematol. 2012 Mar;47(1):17-27. Epub 2012 Mar 28.
    Cellular immunotherapy using dendritic cells against multiple myeloma.
    Nguyen-Pham TN, Lee YK, Lee HJ, Kim MH, Yang DH, Kim HJ, Lee JJ.
    Received December 12, 2011; Revised February 9, 2012; Accepted March 2, 2012. Published online 2012 March 28

    I actually like the quality, it are (both) nice overviews of the subject.. but if you ask me it is a big NO-GO to publish it twice in different journals….

  • Dylan July 16, 2012 at 3:16 am

    In answer to the question of your blog title, I think an example where duplication, or more precisely, iteration is acceptable/useful is in the study and intervention of orphan/rare disorders.

    An example would be a cohort of 10 children with x disease due to y genotype. Whilst an initial open label study demonstrates that z treatment results in a moderate reduction in seizures following 8 weeks of treatment, a second study adds another 5 children, and increases the follow-up to 6 months and shows that at 6 months the treatment has a significant but modest reduction in seizures.

    I hope the example above is clear in what I’m trying to convey, which is that some disorders are so difficult to study that even slight advances and iterations in knowledge are useful to clinicians who manage these patients, and therefore to families.

    • Clare Francis July 29, 2012 at 11:24 am

      In reply to Dylan July 16, 2012 at 3:16 am

      “slight advances and iterations in knowledge are useful to clinicians”. Could you give a real life, published examples? Medicine is quite conservative. I doubt that “slight advances” would change much. The real danger is that studies which are significantly overlapping confuse the picture in that they tend to overstimate the efficacy of treatments. More data, if not presented in a transparent way, is not always better.

      Perhaps the “17% of systematically searched randomised trials of ondansetron as a postoperative antiemetic were covert duplicates”, should be retracted.

  • Biomanipulation July 29, 2012 at 10:56 am

    There has been a constant competition to be ahead in scientific community with one’s own data as triumph card..however this has lead to rise in various types of manipulations and new way of fabrication to present the data through research articles. This link gives the glimpse of some papers that come under scanner for such disgraceful works…

