A graduate student at McGill University is raising concerns that a popular F1000Research paper may have plagiarized his 2014 blog post that — ironically — proposed a method to prevent scientific misconduct. The student calls the paper “a mirror image” of his work.
The February 2016 F1000Research paper, “How blockchain-timestamped protocols could improve the trustworthiness of medical science,” was highlighted by us earlier this year, as well as by The Economist and FierceBiotech. In the paper, physician Greg Irving of the University of Cambridge and John Holden of Garswood Surgery in the UK describe a proof-of-concept of how to use a blockchain—a technology best-known for powering the digital currency bitcoin—to audit scientific studies and prevent misconduct in clinical trials.
After the student brought his concerns to the journal, Irving and Holden published a second version of their paper online, this time prolifically citing the blog entry and altering language that had been identical between the two pieces. F1000Research says “the scientific content is still valid” and has no plans to retract the article. Two public peer reviewers of the work also stand by its validity.
The student who initially raised objections about the paper is Benjamin Carlisle, a doctoral candidate studying biomedical ethics at McGill. In 2014, he wrote “Proof of prespecified endpoints in medical research with the bitcoin blockchain” on his blog, “The Grey Literature.”
Carlisle said he believes that even the updated version of the F1000Research paper still meets the Committee on Publication Ethics’ (COPE) definition of clear plagiarism: “unattributed use of large portions of text and/or data, presented as if they were by the plagiarist.” Carlisle told Retraction Watch:
To me, the F1000 piece looks like a mirror image of my blog entry. Its structure and length look very similar to me, as do the concept, moral rationale, lexicon, and numerous passages.
For example, in the Conclusion section of the blog entry, Carlisle wrote:
Fraud in scientific methods erodes confidence in the medical research establishment, which is essential to it performing its function.
In the Discussion section of Irving and Holden’s second version of the paper, they write:
Fraud in scientific methods erodes confidence in medicine as a whole which is essential to performing its function.
According to Copyscape, an online plagiarism detection site, the first version of the F1000Research paper and the blog entry share 8-9% word-for-word matching content, including describing a blockchain as:
a distributed, permanent, timestamped public ledger
provides a method for establishing the existence of a document at a particular time that can be independently verified by any interested party
Carlisle didn’t just criticize the article for being too similar to his own work — in fact, he said he believes the spots where the articles differ betray Irving and Holden’s lack of familiarity with the technology.
A blockchain relies on distributing a single file among computer users that is time-stamped every time it is altered, creating a list of changes that can be independently tracked and verified by any participant. In his blog entry, Carlisle proposes using an unformatted text file to contain clinical trial information. Irving and Holden, on the other hand, use a Microsoft Word file as their exemplar file, with text formatted in tables.
Jameson Lopp, a blockchain expert and software engineer at bitcoin security firm BitGo, who was not involved with either article, told us the use of a Microsoft Word file with tables is “weird.” He said:
I don’t know how they could consider it “unformatted.” I’m also unclear as to what happens under the hood when you copy/paste a selection from a Word document – I would be surprised if it didn’t also copy additional formatting data, which could cause a different hash to be generated on different computers.
In other words, use of such a file could produce different version for each user, he said, which undermines the blockchain method that relies on every user being able to track and verify the same changes on the same document.
Irving and Holden’s original version of the paper did not cite Carlisle’s blog post. On May 14, 2016, Carlisle told us he contacted F1000Research with his concerns, requesting an investigation to see if plagiarism had occurred. Almost two weeks later, Irving and Holden published a new version of their paper online, amending it with:
The method we tested here was first proposed by Carlisle in the grey literature. Clear reference to the previously described method (Reference 6) has been added throughout the revised article.
The second version describes the method as “first reported by Carlisle” and cites his blog entry eight times. The new version also changes the language in several sections, including re-defining a blockchain as
a distributed, tamper proof public ledger of timestamped transactions.
Carlisle told us he is not satisfied with the changes:
First, ex post facto citation would not undo the misconduct of plagiarism, if it were deemed to have occurred. According to COPE, corrections are only warranted with small passages (e.g. a few sentences in the discussion) of unattributed parallel text. The COPE guidelines also say “Publications should be retracted as soon as possible after the journal editor is convinced that the publication is seriously flawed and misleading (or is redundant or plagiarised).”
…Second, in my view, the revisions concede my concerns. Version 1 of the Introduction seems to claim invention: “Here we propose using a ‘blockchain’ …” Version 2 of the Introduction says, “Here we confirm the use of blockchain …” In version 1, the Author Contributions section says, “GI conceived the study. GI designed the experiments.” This is removed from version 2. Readers can decide whether the “confirmation” represents a real contribution. To me, it feels a bit like publishing a cookie recipe you found on the web, and then trying to claim credit because you were the first to document the actual baking.
Irving and Holden declined to comment for this article, stating in an email:
We have not heard directly from Mr Carlisle since our amended paper appeared recently in F1000. We would expect him to raise any concerns with us himself in the normal manner of academic dialogue.
Ruth Francis, communications director for F1000Research, told us the original paper should have referred to Carlisle’s post, but the journal has no plans to retract the paper:
In response to questions that have arisen, the authors have revised their article and a new version has been published along with a statement explaining the corrections. The article demonstrates empirical proof by testing a method that was proposed in the grey literature; the revised version now clearly cites and discusses this source, which, given the information now available to us, we believe should have been acknowledged from the start.
…The article’s findings and conclusions remain unchanged and the scientific content is still valid. In following the COPE guidelines, we do not believe that a Retraction from the scientific literature is appropriate in this instance, as they advise editors “that the main purpose of retractions is to correct the literature” and to “consider whether readers are best served if the entire article is retracted”.
We have also submitted a case to the COPE Forum for independent advice on this case.
Charilaos Lygidakis, a PhD student at the University of Luxembourg who peer-reviewed the second version of the paper for F1000Research, said he was not concerned by the amendments. He told us:
First, blog posting is just equivalent to idea sharing and brainstorming; I don’t believe it is feasible for researchers to search thoroughly all the information that is published on blogs and other grey literature to see whether others have already written on their topic.
Second, I have had the opportunity to follow the work of the authors, and I hold them in very high regard for their scientific and professional ethos.
Carlisle argued that blog posts should deserve the same protections as published papers:
To speak in general terms, taking large passages of text from another source without attribution—whether or not it has been peer-reviewed—is non-controversially plagiarism. Newer scientific newer communication modes, like blogging, deserve the same protections as conventional media like journal publication.
Amy Price of the University of Oxford reviewed the first version of the paper for F1000Research, and after reading the revised version, told us:
I would argue that the authors were overly generous in crediting the concept to Mr. Carlisle. I don’t think that this blog would meet considerations for authorship in the paper even according to COPE…. The blog suggests a use for a concept based on open access tools in the public domain since 2009 but does not operationalize it. An idea or a tool is not research. To assign that or name it as research is quite a stretch and would not be accepted in an academic or even intellectual property sense.
She disputes Carlisle’s argument that the paper is like a “mirror image” of his blog entry. She argued:
In the blog the morale was not developed or even referenced. It seems to be a common and undeveloped idea for clinical trial registrations. The authors of the actual paper suggested several uses that the blogger failed to consider. The blog contains no supplementary materials and yet the research paper does. This puts to rest the length and similarity argument. It is concerning that this blogger would be attempting a doctorate and comfortably ascribe to a colleague such falsehoods and such a serious allegation as to cause this a “mirror” image.
As for the language overlap, Price said:
This language is a common IT description and the standard way of explaining the storage of data, a method of extractions and a protocol for time stamping it.
We asked bitcoin expert Lopp about the language issue, and he told us:
It is all standard terminology, but while one or two sentences that are nearly identical could be understandable, that many similarities appears a bit too much to be coincidence.
Update 8/17/16 4:10 p.m. eastern: We’ve come across an anonymized COPE case study that appears to describe this story. Among the advice:
On a poll of the Forum audience, the majority agreed that a correction seems to be the appropriate (non-punitive) action (compared with a handful who favoured retraction); a correction also serves the student’s rights by indicating clearly where the ideas originated, and maintaining in the literature the work that validates those ideas. The Forum believed that the editors were correct in the course of action they took, and the requirement that the blog concept be clearly recognized.
The Forum discussed if this was plagiarism. There was certainly plagiarism of ideas and the Forum noted that there should be awareness of “ownership of ideas”. Transparency is key in these scenarios and ideas need to be properly credited. Some argued that the article adds something new (validation) and major correction (to address the unattributed copying via proper reference and attribution) undoes the “harm” done by the absence of attribution.
However, some of the members of the Forum were concerned about the apparent deception—the authors did present the method as their own. They recommended that the journal contact the author’s institution. However, it is a judgment call for the editor as to whether the institution is contacted. The institution might appreciate knowing so they can build guidance on citing grey literature into their teaching/training.
Hat tip: Anna Powell-Smith
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our new daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.