Compression plagiarism: An “under-recognized variety” that software will miss

Michael Dougherty

If you’re interested in plagiarism in the scholarly literature nowadays, you’ve probably come across the name Michael Dougherty. Dougherty’s efforts to root out plagiarism has led to dozens of retractions, including several by a prominent priest. In a new paper in Argumentation, Dougherty, author of the recent book Correcting the Scholarly Record for Research Integrity: In the Aftermath of Plagiarism, has coined a new term: “compression plagiarism.” We asked him more about the phenomenon, which Dougherty says “is invisible to unsuspecting readers and immune to anti-plagiarism software.”

Retraction Watch (RW): You define a term that is new to us: Compression plagiarism. What is compression plagiarism, and why is it particularly problematic?

Michael Dougherty (MD): I use the term “compression plagiarism” to describe a phenomenon I have been seeing in the published research literature: the distillation of a lengthy scholarly text into a short one, with the short one published under different authorship. The typical case involves the compression of a book into an article, so that the article is pieced together from sentences extracted from the beginning, middle, and end of the book. Sometimes the book’s paragraphs are abbreviated into sentences, or long sentences are abbreviated into short ones. Cases of compression plagiarism pose significant challenges for readers and editors, since these cases are apparently immune to standard text-matching software, invisible to unsuspecting readers, and often unrecognized even by those familiar with the original source text. I have seen cases where researchers cite both the source text and the plagiarizing article together; such researchers apparently have missed that the shorter work is a compression of the longer one.

RW: Can you provide an example of what a case might look like?

MD: The first part of my paper in the journal Argumentation describes in general the phenomenon of compression plagiarism, and the second part examines a suspected case of it. I argue that a short 4.5-page article by “N.” that appeared in the same journal in 2006 appears to be a compression of an uncited 490-page 1992 book in German by the philosopher Stefan Gosepath. In this case, we are dealing with both suspected compression plagiarism and suspected translation plagiarism in the same article. Here are two of the pieces of evidence that I include in the paper, showing that texts in the 2006 article appear to derive from the beginning and middle of Gosepath’s uncited lengthy book:

As I observe in my paper, if this case of suspected compression plagiarism is, in fact, a demonstrated case of compression plagiarism, then the following unpleasant claims are true. First, Gosepath, the original author, is denied credit for his 1992 original work when it re-appears in compressed form under N.’s name in the 2006 article. Second, Gosepath’s work therefore has a double representation in the body of published research literature: first as Gosepath 1992 and then as N. 2006. Readers unwittingly encounter the arguments of Gosepath through the proxy of N. 2006, and this phenomenon engenders errors about whom the principal interlocutors in the academic debate truly are. Furthermore, there is an ongoing severe corruption of the downstream literature, since when other researchers positively cite N. 2006, they unwittingly credit to N. what should be credited to Gosepath.

RW: How common do you believe compression plagiarism is?

MD: In the larger context of academic writing, probably not too common, as even garden-variety copy-and-paste plagiarism is relatively uncommon in the sense that the overwhelming majority of research articles are not, of course, plagiarized. But in the last decade, with the help of colleagues, I’ve sent more than a hundred retraction requests to editors and publishers in philosophy and related disciplines, so the problem of plagiarism in the published research literature is not negligible. Only a handful of these cases would qualify as compression plagiarism. Now that the phenomenon has been identified, I suspect more cases will come to light. I am certainly looking for them now.

RW: What is the relationship between translation and compression plagiarism?

MD: When an article exhibits both translation plagiarism and compression plagiarism at the same time, the plagiarism is very difficult to see. A colleague working on identifying cases of plagiarism in German philosophical articles encouraged me to search for German sources for some English articles that I suspected were plagiarized. Her advice was excellent. We have had success in identifying more than a dozen articles exhibiting severe plagiarism in recent months, and we hope to report these findings soon. Some of these cases involve complex forms of plagiarism with sources in German and English. Two cases of translation plagiarism I worked on turned out to be also cases of apparent book-to-article compression plagiarism. I’m speaking at the 2019 COPE North American Seminar on the topics of compression plagiarism and translation plagiarism, so I hope to raise awareness of these subtler forms of plagiarism in my discipline of philosophy.

RW: Your paper describes “a suspected instance of compression plagiarism that appeared within the pages of this journal.” You don’t name the author, although you provide the references, making him easy to identify. Why did you take that approach?

MD: My interest is in plagiarism, rather than plagiarists. In this paper, as well as in a recent book I wrote on plagiarism in philosophy, I don’t identify by name any suspected plagiarists, but instead direct my analyses to suspected acts of plagiarism. In this paper I simply refer to the author of record as “N.” Yes, it easy to identify the authors of record for the plagiarism cases that I have discussed in print, and interested readers can track them down. In this case, N. is familiar to Retraction Watch readers, as some of the 12 retractions, errata, and corrigenda N. has earned have been covered here. It is not always possible to avoid naming suspected plagiarists, however, so I have named names in some co-authored articles on other cases. By publishing the evidence of suspected compression plagiarism in the pages of the journal that first published the article in question, my paper can be seen as an alternate avenue for plagiarism whistleblowing. Past experience has shown me that traditional routes of whistleblowing are not always successful, since the home institutions of suspected plagiarists sometimes display unusual behavior.

RW: Tell us about the case. How has the journal responded to the allegations? (It seems odd that they would publish your piece but not retract, no?)

MD: I am grateful to the editor and the reviewers at Argumentation for supporting my paper. Not every journal would welcome a paper that points out a suspected case of plagiarism in a previous issue of the journal. My initial submission of the manuscript in November 2018 to Argumentation included a request that the journal retract the article in question based on the the evidence of suspected plagiarism I provided in my manuscript. Last week I wrote to the editor again, but I received only an out-of-office reply. In the last round of anonymous peer review prior to publication, one reviewer stated that I had “established such a strong case that the burden of proof is now on ‘N.’ to deny that the essay in question contains plagiarism.” The editor of Argumentation has responded favorably to similar requests in the past; in 2015, I reported evidence of suspected plagiarism in two articles by the same author of record to the editor, resulting in the publication of a retraction and an erratum later that year. The erratum in that case was somewhat unusual as it contained the entire text of the researcher’s earlier published 10-page article, but this time the text was supplemented with newly-added quotation marks, in-text citations, and an expanded bibliography that incorporated all of the sources I had identified as missing in the original version. I am interested to see how the journal will proceed with this new case.

RW: You write that “It is a mistake to identify plagiarism exclusively with its most obvious form, however.” Explain.

MD: In my experience, most people identify plagiarism with the obvious copy-and-paste variety. But the word plagiarism is an equivocal expression that can apply to articles displaying a wide variety of serious authorship violations. Theorists including Debora Weber-Wulff have done great work in proposing typologies of plagiarism that include subtle, disguised forms of plagiarism.

RW: You stress the importance of retracting papers that plagiarize, but note that not all such papers are retracted. Why do you think that’s the case?

MD: I’m not sure I have a good answer. Perhaps some editors still think that the issuance of a retraction is something shameful or is some concession of some embarrassing past editorial failure. When I see that a journal has issued a retraction, my estimation of the journal improves; a retraction is evidence that a journal is committed correcting the scholarly record, which ensures the reliability of the published research literature for students and researchers. I worry also that some journal editors might be enforcing a statute of limitations for plagiarism cases. Subtle forms of plagiarism can take many years – if not decades – to be discovered, so statutes of limitation impede corrections. Nevertheless, I am hopeful that the situation will improve.

Like Retraction Watch? You can make a tax-deductible contribution to support our growth, follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up for an email every time there’s a new post (look for the “follow” button at the lower right part of your screen), or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].

2 thoughts on “Compression plagiarism: An “under-recognized variety” that software will miss”

  1. We have another case that cannot be easily detected by straightforward syntax-based plagiarism detectors. We call it “Content and structure plagiarism by simple sentence rephrasing”. This kind of plagiarism, like compression plagiarism, is undetectable by Turnitin (http://www.turnitin.com) but it becomes so obvious after aligning the sentences between the plagiarism paper and the original paper. We have posted this case on PubPeer (https://pubpeer.com/publications/AD5EDF916EBF04B1DCDBC03C45493E) so that more people are able to detect plagiarism beyond syntax similarities.

  2. Not all plagiarism can be caught by programs since some plagiarism involves rewriting another scholar’s argument using new words. Technology has made plagiarism a much more pervasive problem but I wonder if one reason compression plagiarism has been overlooked is that philosophers tend to be more sanguine about plagiarism of ideas than scholars in other professions. If one paper is a compressed version of another it seems both would make the same argument. You wonder why this borrowing wasn’t spotted or whether it was, and the plagiarism was deemed too hard to prove due to lack of syntax similarities (the ‘smoking gun’ I take it).

    I look forward to reading your book, Prof. Dougherty. Thank you for your work on this.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.