Retraction Watch

Tracking retractions as a window into the scientific process

Plagiarism concerns raised over popular blockchain paper on catching misconduct

with 18 comments

f1000researchA graduate student at McGill University is raising concerns that a popular F1000Research paper may have plagiarized his 2014 blog post that — ironically — proposed a method to prevent scientific misconduct. The student calls the paper “a mirror image” of his work.

The February 2016 F1000Research paper, “How blockchain-timestamped protocols could improve the trustworthiness of medical science,” was highlighted by us earlier this year, as well as by The Economist and FierceBiotech. In the paper, physician Greg Irving of the University of Cambridge and John Holden of Garswood Surgery in the UK describe a proof-of-concept of how to use a blockchain—a technology best-known for powering the digital currency bitcoin—to audit scientific studies and prevent misconduct in clinical trials.

After the student brought his concerns to the journal, Irving and Holden published a second version of their paper online, this time prolifically citing the blog entry and altering language that had been identical between the two pieces. F1000Research says “the scientific content is still valid” and has no plans to retract the article. Two public peer reviewers of the work also stand by its validity.

The student who initially raised objections about the paper is Benjamin Carlisle, a doctoral candidate studying biomedical ethics at McGill. In 2014, he wrote “Proof of prespecified endpoints in medical research with the bitcoin blockchain” on his blog, “The Grey Literature.”

Carlisle said he believes that even the updated version of the F1000Research paper still meets the Committee on Publication Ethics’ (COPE) definition of clear plagiarism: “unattributed use of large portions of text and/or data, presented as if they were by the plagiarist.” Carlisle told Retraction Watch:

To me, the F1000 piece looks like a mirror image of my blog entry. Its structure and length look very similar to me, as do the concept, moral rationale, lexicon, and numerous passages.

For example, in the Conclusion section of the blog entry, Carlisle wrote:

Fraud in scientific methods erodes confidence in the medical research establishment, which is essential to it performing its function.

In the Discussion section of Irving and Holden’s second version of the paper, they write:

Fraud in scientific methods erodes confidence in medicine as a whole which is essential to performing its function.

According to Copyscape, an online plagiarism detection site, the first version of the F1000Research paper and the blog entry share 8-9% word-for-word matching content, including describing a blockchain as:

a distributed, permanent, timestamped public ledger

that

provides a method for establishing the existence of a document at a particular time that can be independently verified by any interested party

Carlisle didn’t just criticize the article for being too similar to his own work — in fact, he said he believes the spots where the articles differ betray Irving and Holden’s lack of familiarity with the technology.

A blockchain relies on distributing a single file among computer users that is time-stamped every time it is altered, creating a list of changes that can be independently tracked and verified by any participant. In his blog entry, Carlisle proposes using an unformatted text file to contain clinical trial information. Irving and Holden, on the other hand, use a Microsoft Word file as their exemplar file, with text formatted in tables.

Jameson Lopp, a blockchain expert and software engineer at bitcoin security firm BitGo, who was not involved with either article, told us the use of a Microsoft Word file with tables is “weird.” He said:

I don’t know how they could consider it “unformatted.” I’m also unclear as to what happens under the hood when you copy/paste a selection from a Word document – I would be surprised if it didn’t also copy additional formatting data, which could cause a different hash to be generated on different computers.

In other words, use of such a file could produce different version for each user, he said, which undermines the blockchain method that relies on every user being able to track and verify the same changes on the same document.

Irving and Holden’s original version of the paper did not cite Carlisle’s blog post. On May 14, 2016, Carlisle told us he contacted F1000Research with his concerns, requesting an investigation to see if plagiarism had occurred. Almost two weeks later, Irving and Holden published a new version of their paper online, amending it with:

The method we tested here was first proposed by Carlisle in the grey literature. Clear reference to the previously described method (Reference 6) has been added throughout the revised article.

The second version describes the method as “first reported by Carlisle” and cites his blog entry eight times. The new version also changes the language in several sections, including re-defining a blockchain as

a distributed, tamper proof public ledger of timestamped transactions.

Carlisle told us he is not satisfied with the changes:

First, ex post facto citation would not undo the misconduct of plagiarism, if it were deemed to have occurred. According to COPE, corrections are only warranted with small passages (e.g. a few sentences in the discussion) of unattributed parallel text. The COPE guidelines also say “Publications should be retracted as soon as possible after the journal editor is convinced that the publication is seriously flawed and misleading (or is redundant or plagiarised).”

…Second, in my view, the revisions concede my concerns. Version 1 of the Introduction seems to claim invention: “Here we propose using a ‘blockchain’ …” Version 2 of the Introduction says, “Here we confirm the use of blockchain …” In version 1, the Author Contributions section says, “GI conceived the study. GI designed the experiments.” This is removed from version 2. Readers can decide whether the “confirmation” represents a real contribution. To me, it feels a bit like publishing a cookie recipe you found on the web, and then trying to claim credit because you were the first to document the actual baking.

Irving and Holden declined to comment for this article, stating in an email:

We have not heard directly from Mr Carlisle since our amended paper appeared recently in F1000. We would expect him to raise any concerns with us himself in the normal manner of academic dialogue.

Ruth Francis, communications director for F1000Research, told us the original paper should have referred to Carlisle’s post, but the journal has no plans to retract the paper:

In response to questions that have arisen, the authors have revised their article and a new version has been published along with a statement explaining the corrections. The article demonstrates empirical proof by testing a method that was proposed in the grey literature; the revised version now clearly cites and discusses this source, which, given the information now available to us, we believe should have been acknowledged from the start.

…The article’s findings and conclusions remain unchanged and the scientific content is still valid. In following the COPE guidelines, we do not believe that a Retraction from the scientific literature is appropriate in this instance, as they advise editors “that the main purpose of retractions is to correct the literature” and to “consider whether readers are best served if the entire article is retracted”.

We have also submitted a case to the COPE Forum for independent advice on this case.

Charilaos Lygidakis, a PhD student at the University of Luxembourg who peer-reviewed the second version of the paper for F1000Research, said he was not concerned by the amendments. He told us:

First, blog posting is just equivalent to idea sharing and brainstorming; I don’t believe it is feasible for researchers to search thoroughly all the information that is published on blogs and other grey literature to see whether others have already written on their topic.

Second, I have had the opportunity to follow the work of the authors, and I hold them in very high regard for their scientific and professional ethos.

Carlisle argued that blog posts should deserve the same protections as published papers:

To speak in general terms, taking large passages of text from another source without attribution—whether or not it has been peer-reviewed—is non-controversially plagiarism. Newer scientific newer communication modes, like blogging, deserve the same protections as conventional media like journal publication.

Amy Price of the University of Oxford reviewed the first version of the paper for F1000Research, and after reading the revised version, told us:

I would argue that the authors were overly generous in crediting the concept to Mr. Carlisle. I don’t think that this blog would meet considerations for authorship in the paper even according to COPE…. The blog suggests a use for a concept based on open access tools in the public domain since 2009 but does not operationalize it. An idea or a tool is not research. To assign that or name it as research is quite a stretch and would not be accepted in an academic or even intellectual property sense.

She disputes Carlisle’s argument that the paper is like a “mirror image” of his blog entry. She argued:

In the blog the morale was not developed or even referenced. It seems to be a common and undeveloped idea for clinical trial registrations. The authors of the actual paper suggested several uses that the blogger failed to consider. The blog contains no supplementary materials and yet the research paper does. This puts to rest the length and similarity argument. It is concerning that this blogger would be attempting a doctorate and comfortably ascribe to a colleague such falsehoods and such a serious allegation as to cause this a “mirror” image.

As for the language overlap, Price said:

This language is a common IT description and the standard way of explaining the storage of data, a method of extractions and a protocol for time stamping it.

We asked bitcoin expert Lopp about the language issue, and he told us:

It is all standard terminology, but while one or two sentences that are nearly identical could be understandable, that many similarities appears a bit too much to be coincidence.

Update 8/17/16 4:10 p.m. eastern: We’ve come across an anonymized COPE case study that appears to describe this story. Among the advice:

On a poll of the Forum audience, the majority agreed that a correction seems to be the appropriate (non-punitive) action (compared with a handful who favoured retraction); a correction also serves the student’s rights by indicating clearly where the ideas originated, and maintaining in the literature the work that validates those ideas. The Forum believed that the editors were correct in the course of action they took, and the requirement that the blog concept be clearly recognized.

The Forum discussed if this was plagiarism. There was certainly plagiarism of ideas and the Forum noted that there should be awareness of “ownership of ideas”. Transparency is key in these scenarios and ideas need to be properly credited. Some argued that the article adds something new (validation) and major correction (to address the unattributed copying via proper reference and attribution) undoes the “harm” done by the absence of attribution.
However, some of the members of the Forum were concerned about the apparent deception—the authors did present the method as their own. They recommended that the journal contact the author’s institution. However, it is a judgment call for the editor as to whether the institution is contacted. The institution might appreciate knowing so they can build guidance on citing grey literature into their teaching/training.

Hat tip: Anna Powell-Smith

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our new daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.

Comments
  • fernandopessoa July 14, 2016 at 9:57 am

    Amy Price, reviewer of the first version of the paper.

    “It seems to be a common and undeveloped idea for clinical trial registrations.”
    Then is not the idea of the authors of the paper either.

    “The authors of the actual paper suggested several uses that the blogger failed to consider.”
    Some of the uses were the same.

  • herr doktor bimler July 14, 2016 at 2:53 pm

    I don’t believe it is feasible for researchers to search thoroughly all the information that is published on blogs and other grey literature to see whether others have already written on their topic

    The reviewer is missing the point that the researchers evidently did search information that had been published on blogs; perhaps not thoroughly, but thoroughly enough to find Carlisle’s contribution.

  • Warrick July 14, 2016 at 3:56 pm

    This seems to me very similar to shoplifting and then attempting to pay for the goods once caught while insisting innocence.

    • AABB July 14, 2016 at 4:04 pm

      … and attempting to pay with ‘thank you’s rather than money, or another appropriate currency (ie. authorship credit).

      • Kez July 15, 2016 at 3:51 am

        This ^ and This ^^

  • PWK July 15, 2016 at 1:46 am

    Plagiarism is generally defined as taking someone’s work *or ideas* and passing them off as ones own. The focus on text similarity misses the point.

    • Marco July 18, 2016 at 12:42 pm

      In this case the high textual similarity actual means that Irving & Holden have a harder time to claim they did not know about Carlisle’s ideas, which in this case is rather important for the claim of plagiarism. It is always possible two different people come with the same idea around the same time, without the two knowing about each other. In this latter case it would be inappropriate to make a claim of plagiarism, and pointing out, at a later time point, that someone apparently had come with the same idea a bit earlier, is perfectly OK.

      • LE Parker July 18, 2016 at 3:13 pm

        “It is always possible two different people come with the same idea around the same time, without the two knowing about each other. In this latter case it would be inappropriate to make a claim of plagiarism”
        If this were the case, wouldn’t it also be inappropriate to retroactively cite (in this case profusely) the other person’s work? The primary objective of the references section is to direct readers to sources that the authors consulted while preparing the manuscript. The CSE manual Scientific Style and Format says “References fulfill 2 essential roles in the research and publishing process, ensuring intellectual integrity by (1):
        • Giving credit to those individuals and organizations whose published works have contributed to the research being reported
        • Providing users of references with sufficient information to uniquely identify and locate a published work”

  • Kenneth L Busch July 15, 2016 at 8:55 am

    Allegations of research misconduct (here it would seem to be plagiarism) are usually handled by an inquiry and then investigation (if warranted) by the appropriate institution. That confidential process establishes the facts of the matter in an objective fashion, and avoids the public point/counterpoint exchanges described in this posting. )

    • Neuroskeptic July 15, 2016 at 10:07 am

      The facts, fortunately, are available for anyone to verify, simply by reading the paper in question and the related blog post.

    • Jonathan Kimmelman July 18, 2016 at 11:40 am

      Right. My student requested an investigation at F1000 in May, and we are now in late July. COPE urges swift investigations. F1000- which has branded itself as a rapid review and publication platform- has been anything but rapid.

  • Nick July 20, 2016 at 4:56 am

    Does anyone else find it hilarious that Irving and Holden (now) say that “The method we tested here was first proposed by Carlisle in the grey literature”, when Carlisle’s blog is literally called “The Grey Literature”? In other news, there is apparently a predatory journal called “Nature and Science”. I look forward to Irving and Holden publishing their next great ideas there, so that can tell everyone that “we have been published in Nature and Science”.

    As for this:
    “Irving and Holden declined to comment for this article, stating in an email:
    We have not heard directly from Mr Carlisle since our amended paper appeared recently in F1000. We would expect him to raise any concerns with us himself in the normal manner of academic dialogue.”
    I would like to say that I’m speechless, but this is all too common. Two senior people invite the poor grad student whose ideas they ripped off to engage in academic dialogue with them. Next time someone burgles my house, should I go round to them and ask if they would like to discuss shared ownership of my TV? After all, calling the police is so unconducive to dialogue.

  • Benjamin Carlisle July 20, 2016 at 9:01 am

    I was feeling snarky and renamed my blog “The Grey Literature, apparently” when Irving and Holden used a non-standard citation format to cite me in version 2 that omitted the name of my blog, the title of my post and the date it was published.

  • Sandy Deklerk August 8, 2016 at 8:51 pm

    The applications of blockchain technology are endless and given the open source nature of the protocol, anyone can use the technology to create their own applications. However, the publish or perish environment in scientific research sometimes force researchers to publish incomplete or plagiarized papers which may eventually create more problems instead of solving them.

  • Daniel Himmelstein March 8, 2017 at 7:42 pm

    —–BEGIN PGP SIGNED MESSAGE—–
    Hash: SHA512

    Thanks for the great coverage of the plagiarism issues with Irving & Holden’s study.

    As I explore in a blog post (http://blog.dhimmel.com/irreproducible-timestamps/), there are additional issues. It appears that the authors completely botch the timestamping implementation. As a result, there is no record of the clinical trial protocol in the bitcoin blockchain.

    I’m also concerned by the paper’s statement that a “second researcher” replicated the address generation, since the flawed implementation means this would be a mathematical impossibility.

    Finally, this is an extremely interesting case of irreproducibility. Irving & Holden have, presumably unintentionally, deposited a ~$50 bounty to reproduce their analysis. The bounty remains unclaimed more than a year after its public disclosure.
    —–BEGIN PGP SIGNATURE—–
    Version: Keybase OpenPGP v2.0.64
    Comment: https://keybase.io/crypto

    wsBcBAABCgAGBQJYwKYWAAoJEAOot/hH7qRdPTYIANxEY99m8NA5+RQw14+rwjvT
    e5ED1WgCLA2RXB1/d7XM7F7ePip8z8A+glmSELGYzSn0kFq77Jd471FqT3MO7pID
    H9M1QJ0JQ4DbD7+8sCATSPkqaZB1EPpxvzVhuf6wdKABdNDXVMSTrnxfc7UiscFd
    foTgl/7r8B4NNPWSAzrKmCAMOMYoshN0JnHKbRF9iYVUoHhoJemT4SSGPW0sXToi
    oGPieAIgh9++TJ3E5xOofRZuOJdTrX5cFqrAmrjsT2YiRwQXyX/BIsEzkqq26jxl
    aHEXDvYI9E7Cdh585Ba1kJf1bUUP8tsjz0jCjMI78ZGUACazpSzVyJIS1zV8VMY=
    =adBu
    —–END PGP SIGNATURE—–

  • Jordan Anaya May 26, 2017 at 3:10 am

    Although the COPE case study currently appears to be anonymized, it was not, as evidenced by this archived version of the report which states the forum was made aware of this Retraction Watch story:
    https://web.archive.org/web/20160820144820/http://publicationethics.org/case/what-extent-plagiarism-demands-retraction-vs-correction

  • Post a comment

    Threaded commenting powered by interconnect/it code.