The journal Digestion has a retraction notice that’s, well, an amusing morsel.
At issue was a 2011 paper on a biomarker for liver cancer by a group of Turkish authors who plagiarized from the work of others.
Here’s the notice for the article, titled “Diagnostic and Prognostic Validity of Golgi Protein 73 in Hepatocellular Carcinoma“:
We regret to inform you that the criticism raised by the Editorial Board is correct concerning the similarity between some parts of the texts present in our article published in Digestion [2011; 83: 83–88], and the papers in the Journal of Hepatology [2005; 43: 1007–1012] and in Hepatology [2009; 49: 1421–1423], although the research data are completely independent.
We apologize for this unfortunate error, which was established during the writing process of the manuscript by the author Harun Erdal. Although the final version of the submitted paper had been examined by all authors, they failed to recognize the ‘transferred parts’ of the papers in the Journal of Hepatology [2005; 43: 1007–1012] and in Hepatology [2009; 49: 1421–1423]. Thus, for the sake of scientific clarity and based on the above-mentioned facts, we prefer to retract our paper Diagnostic and Prognostic Validity of Golgi Protein 73 in Hepatocellular Carcinoma. Digestion 2011; 83: 83–88 (DOI:10.1159/000320379).
The notice lists all the authors.
The plagiarized paper from the Journal of Hepatology was this one: “GP73, a resident Golgi glycoprotein, is a novel serum marker for hepatocellular carcinoma;” the one from Hepatology was an editorial: “Golgi protein 73 as a biomarker of hepatocellular cancer: Development of a quantitative serum assay and expression studies in hepatic and extrahepatic malignancies.”
My university provides plagiarism detection software to anyone who teaches. I now use it during the review process as well. Perhaps journals could automatically run all submission through similar software before manuscripts get sent out for review. It would be easy to implement and fairly cheap.
it’s not always that simple. we use cross check to scan all papers published in our journal but the software often misses plagiarized sections. We caught them because we have a final step when we proof English language prior to publication and there are changes in writing style. we check by googling the suspect passages and end up with hits to other manuscripts.
They probably copied and pasted some introduction bits from the two papers. Harun Erdal, for whom English is presumably a foreign language, decided to make his job easier by lifting text from others without proper attribution. However, the other authors “examined” the final version of the manuscript but yet failed to notice that the quality of writing was not, more or less, the same throughout.
If Erdal suffered from writer’s block or the task he was assigned to complete was too overwhelming, he should have followed the copying and pasting by reshuffling and occasional synonym replacement. It would have sufficed.
“…he should have followed the copying and pasting by reshuffling and occasional synonym replacement. It would have sufficed.”
Even if he were to cover his tracks better it’s still be plagiarism.
“…he should have followed the copying and pasting by reshuffling and occasional synonym replacement. It would have sufficed.”
Even if he were to cover his tracks better it’s still be plagiarism.
Not necessarily. Particularly for pieces of the introduction, wherein one establishes the “state of the field” by citing relevant work, there are a limited number of ways to say something. Consequently, the act of reshuffling and occasional synonym replacement may well result in a completely distinct way of stating the same material. Or, at least, one that is at least as distinct as if another individual had started from scratch.
For instance, there are just shy of 290,000 papers concerning insulin that come up in a quick PubMed search (topic chosen at random). Are there really 290,000 distinct ways of saying that insulin is a 2-strand peptide containing 3 disulfide bonds that acts to regulate blood glucose? And what about the next paper; do they need an entirely new, distinct way of saying it? As papers are perpetually being added to the scientific literature, this problem is going to become increasingly prevalent. You’ve probably faced it in your work as well–whether you are aware of it or not.
So there end up being a number of factors, including (1) how long the segment was, (2) the extent of reshuffling and synonym replacement, (3) the number of published, redundant and equivalent statements. This is one of the areas that becomes a little grey with respect to plagiarism, especially since quoting other pieces of scientific literature directly–even using quotation marks to designate it and citing it accordingly–is somewhat taboo (although I have never understood the reasons for this taboo). These factors can easily combine to make what the author may have believed to be an original statement into plagiarism.
The end result that I am left with is that intention is paramount. However, the intention of the authors with respect to plagiarism is often unclear, and the authors can hardly be relied upon to give truthful answers regarding their intentions. In addition to being virtually impossible to determine, many would argue that intentions don’t count for anything–it is the end result that counts for everything.
And this brings us back to the grey area of what constitutes plagiarism. In my own writing, I always start afresh, but I often find (after the fact) that I had unwittingly take short phrases from the papers that I re-read while preparing myself to start writing–and as Chirality pointed out, this is almost exclusively in the introduction. Is it plagiarism? No, because I never had the intention of copying their work, and these 3-10 word sentence fragments are the best phrasing with respect to clarity, which is often more important than novelty in an introduction.
I think it’s a more complicated issue than you make it out to be, failuretoreplicant. It’s one thing to take a [single] compound sentence, switch the order of the dependent and independent clauses, and then switch some of the non-scientific (and therefore less technical/specific) words out for their synonyms. It’s an entirely different matter to copy several paragraphs, merely switching the order of the paragraphs and a few synonyms in each paragraph. The latter is clearly plagiarism, but the former may just be an attempt to preserve clarity while expressly avoiding plagiarism.
Before you distinctly state that reshuffling and synonym replacement is always plagiarism, here’s an exercise. Find 290,000 distinct ways of stating the following: “Insulin is a peptide hormone composed of two parallel chains connected by two disulfide bonds, with a third disulfide linking the shorter of the two chains to itself; it is synthesized in the pancreas and released in healthy individuals in response to elevated blood glucose concentrations.” In coming up with 290,000 distinct phrasings, you’re welcome to break it up into more than one sentence. Oh–given your assertion–maybe you should avoid just reshuffling the sentence and doing synonym replacement. Only 289,999 ways to go!
Copying sections of someone’s work, reshuffling the order and changing wording doesn’t make it not plagiarism. It just masks plagiarism.
“Copying sections of someone’s work, reshuffling the order and changing wording doesn’t make it not plagiarism. It just masks plagiarism.”
It’s not that simple–something that you apparently refuse to recognize. In fact, the algorithm that you proposed may in fact result in something that is more different from the original work than if another author sat down and started from scratch, without ever having encountered the original work, but having read other, similar papers. So what you’re saying is that plagiarism boils down to the process and intentions of the authors more than the final product. This is an interesting view, but as I pointed out, it is impossible to determine either the process or the intentions of the authors unequivocally. Therefore, by your definition, it is impossible to unequivocally determine whether or not something is plagiarism unless it is a verbatim segment of at least 2-3 sentences.
And what about plagiarism that happens within a single paper? For instance, my PI in graduate school objected to a statement (in an “Author Contributions” section) that the idea for a paper was conceived by me (it was–he even told me to drop the project when I brought him preliminary data), whereby implicitly stating that he had conceived the idea (since he was the last author, and the work was supported by a grant to him, therefore the [erroneous] implication was that he had conceived the idea in writing the grant). Since all plagiarism involves an implicit (and not explicit) statement of credit for an idea that is not the author’s, this was undeniably plagiarism. But, this form of plagiarism is essentially accepted within the scientific community–enough so that very few journals have Author Contributions sections. The paper ended up in a journal that doesn’t have such a section, so given that this section didn’t exist–was it plagiarism?
You can call it black and white if you want. Many cases (and probably the one referenced above) are either black or white. But to refuse to acknowledge the existence of the grey area is likely to be more detrimental to the end goal of eliminating plagiarism altogether than to acknowledge the grey areas and discuss them.
Sure, there will be misses: someone copies a section of someone’s work and changes it enough so that the plagiarism isn’t detected.
Sure, there will be false alarms: two authors write a section with phrasing very similar or identical, but do so independently.
I don’t believe we should let that deter us from trying to catch people who are behaving unethically. If the similarity is a coincidence then the authors can say so, if it’s an honest mistake by a junior author, give the authors a chance to fix it. I bet the authors of this paper wish they would have have been notified before publication instead of ending up with a retraction on their record.
… I think I now understand where our difference of opinion arises. You are discussing conceptual plagiarism, wherein someone states a claim that was established only in a couple of articles without citing those articles, often to make that claim seem more novel. On the other hand, I am discussing a more literary approach to plagiarism, in which one worries about reusing exact phrasing for statements that could be considered common knowledge, based on achieving a ‘critical mass’ within the literature (although common knowledge to a biologist is different from common knowledge to a chemist, complicating this issue).
I agree that to state someone else’s idea without citation is plagiarism, regardless of wording. I believe that the grey area arises when discussing background information (e.g. an Introduction section, as Chirality suggested), particularly when the paper that might be getting plagiarized is cited elsewhere in the manuscript (but not quoted directly, as this is taboo). This grey area exists because different people have different concepts of what constitues “common knowledge.” While I may think it’s common knowledge that insulin has an intrachain disulfide bond, you may not. This difference would be a critical factor contributing to the existence of grey areas.
Since software that looks for plagiarism (as you mentioned) can only look for literary plagiarism, I was trying to direct the focus of the discussion to that, rather than the stages before a given concept has made its way into common knowledge, as that situation is much more cut-and-dry, and therefore less interesting as a discussion topic.
To change the subject slightly, I’m going to note the technique described by simon rayner of googling a suspect passage and pile on to the discussion at hand by asserting that googling is limited to about 32 words…
Apparently this technique is quite popular and I’d like to hear from people who have used it extensively.
32 words is generally enough to point you to a sentence in the, er, related paper. While some authors are ‘sourcing’ a single paper, what we’ve found is that it’s also quite common to cut and paste individual sentences from several papers to form a plagiarism combo. Thus, we find three or four papers that have single sentences copied verbatim in the Introduction and Discussion section. What usually gives them away is the change from near perfect style in these sections to something less than perfect in the Results section. We are able to do this final review step because our number of submissions is manageable.
I’ve done that a few times myself. Really excellent sentences followed by complete gibberish, followed by fully legible English again. Google the good sentences, and in most cases: *ping* – plenty of hits. Not uncommonly, I also found several papers with the same sentence.
Still, at the same time I’d rather be cited correctly by someone copying an entire paragraph I wrote (which has happened) that incorrectly by someone just making random statements followed by a reference to my paper that actually says something completely different (which has also happened).
If anyone can take 32 words from my paper and google them to find who originally said it, why do I need to reference it and put it in quotes? Maybe this is just anachronistic. Maybe in future I can have a program which automatically provides references for all my sentences!