The authors of a popular — and heavily debated — F1000Research paper proposing a method to prevent scientific misconduct have decided to retract it.
The paper was initially criticized for allegedly plagiarizing from a graduate student’s blog — and revised to try to “rectify the overlap.” But according to F1000, it is now being retracted after an additional expert identified problems with the methodology.
Today, F1000 added this editorial note to the paper:
Due to the methodological concerns raised by a peer reviewer during the post-publication open peer review process, the authors will retract this article from F1000Research. The formal retraction note will be posted in due course.
The paper, “How blockchain-timestamped protocols could improve the trustworthiness of medical science,” caught the media’s attention after it first appeared in February 2016, receiving mentions in The Economist and FierceBiotech (as well as our site). In the paper, physician Greg Irving of the University of Cambridge and John Holden of Garswood Surgery in the UK how to use a blockchain—the technology that powers the digital currency bitcoin—to audit scientific studies, as well as prevent misconduct in clinical trials.
Sabina Alam, Editorial Director at F1000, sent us this statement:
We were alerted to concerns about the methods and scientific validity of this article in a comment posted on the article. In the interest of completeness of the peer review process and addressing these concerns we invited a fourth peer reviewer, William J. Knottenbelt, an expert in cryptocurrency. Professor Knottenbelt submitted a peer review report stating that the methodology was not correct. Upon reading this peer review report the authors requested that the article be retracted. We have now placed an editorial note on the paper to notify readers it will be retracted. A full retraction notice will be posted on our site soon and we will work with PubMed to have all versions of the article indexed there retracted. As we have Crossmark implemented throughout our site, it should be clear that the paper is retracted, no matter what version people access.
That additional expert, William Knottenbelt at Imperial College London, told us he agreed with the authors’ decision:
I think the sensible thing for them to do is to retract it.
Knottenbelt, a computing expert, said he believes the authors misinterpreted one step of their methodology:
It was an understandable confusion, because this whole area is very complicated.
He added that he wasn’t surprised the initial reviewers of the paper missed the problem, as well, given the type of varied expertise they would need to have to review a paper on this topic:
I think if people are going to work on this kind of stuff then they need to bring together the right combination of multidisciplinary expertise.
It wasn’t methodology concerns that initially sparked the debate about this paper — it was the allegation by Benjamin Carlisle, a doctoral candidate studying biomedical ethics at McGill, that the paper had plagiarized his 2014 blog post. Even after the authors updated the original paper to try to address the overlap, Carlisle told us last July that he still believed the new version was a “mirror image of my blog entry.”
Carlisle’s advisor, Jonathan Kimmelman, told us today he suspected the plagiarism allegations may have ultimately prompted this retraction:
I think the plagiarism allegations probably brought much more careful scrutiny to this article than would have otherwise occurred.
Kimmelman added that he thought the paper should have been retracted earlier for plagiarism alone, but is happy it has finally happened:
This has been a long process, and I’m pleased to see this outcome.
Knottenbelt concluded that this retraction was an example of how publishing should happen — authors release findings, then retract them if outside experts uncover errors:
This is how the scientific process is supposed to work…That’s what peer review is for.
Update, 4:00 p.m. Eastern, 5/24/17: Daniel Himmelstein, who had commented on the F1000 paper, connects some dots for us in the comments, noting that Knottebelt’s “review reaches the same conclusions as my blog post 3 months earlier.” He’s talking about this blog post, which we highlighted in Weekend Reads a few days after its publication.
Update 5/25/17 9:07 p.m. eastern: We’ve heard from author John Holden, who told us:
The paper was revised after consultation with others and following considerable efforts to provide a method that was entirely reliable.
When it became clear that this could not be achieved it was decided that retraction was scientifically the right course of action to take.
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.
Knottenbelt was a lot less circumspect in his actual review:
“I am also struggling to see the insight provided by the content of the paper, even if the methodology can be corrected. In my opinion the whole paper could be summarised in two sentences: “Blockchains can provide timestamped proof-of-existence for documents (see e.g. http://proofofexistence.com). So for example you might encode the existence of a clinical trial protocol in a blockchain to ensure it is not subsequently tampered with.”; since this is arguably the point of one of the references published some years previously (https://www.bgcarlisle.com/blog/2014/08/25/proof-of-prespecified-endpoints-in-medical-research-with-the-bitcoin-blockchain/, which incidentally contains an alarming similar methodology), I do not see value in publishing the present work.”
It’s a bit disappointing and ironic that I’m not credited here for this retraction.
To recap, on March 8, 2017, I published a detailed blog post titled The most interesting case of scientific irreproducibility? This blog post described how Irving & Holden’s timestamping method was broken. While the study had previously been criticized for plagiarism, I was first to discover that the implementation was irreparably botched.
On March 24, 2017, F1000Research contacted me and invited me to submit my criticisms as a research note. I declined since I felt retraction was the just course. During this time, I was in touch with Benjamin Carlisle, author of the plagiarized blog post. Discussion with Carlisle — an expert in ethics — helped convince me to strongly encourage F1000Research to retract the article.
On March 30, 2017, Irving & Holden posted version 3 of their study. This version conceded the fundamental error I discovered, without admitting that their blockchain timestamp was consequently nonexistent. The same day, I updated my blog post to account for version 3.
The next day (March 31, 2017), F1000Research emailed me following: “In response to the points you have raised regarding the validity of this study, we are inviting additional independent reviewers with expertise in blockchain technology and cryptography.” In addition they added the following editorial note to the manuscript: “Due to concerns raised about the methods and scientific validity of this paper, as well as the completeness of the peer review process (see reader comments on this article), advice from an additional independent peer reviewer with expertise in blockchain technology and cryptography is being sought.”
On May 12, 2017, F1000Research contacted me that they were having trouble finding reviewers with the appropriate blockchain expertise. I responded with several suggestions. I believe Dr. Knottenbelt was asked to review independently of my suggestions. On May 22, 2017, William Knottenbelt provided his review.
Knottenbelt’s review reaches the same conclusions as my blog post 3 months earlier. My blog post provides more depth and background. Furthermore, Knottenbelt’s presents no additional criticisms beyond my blog. Indeed, many of his points and references are the same. I don’t view this as a problem alone. Knottenbelt was asked to review the article in light of my concerns. And we both are making the right arguments. However, I do wish Knottenbelt’s review credited my blog post. I find it highly unlikely that Knottenbelt came to these conclusions entirely independently, especially since I had twice commented on the article. Perhaps since my blog post was not credited, RetractionWatch failed to make the connection that my analysis was the novel finding that led to this retraction.
No hard feelings. Just wanted to set the record straight and point out the reoccurring theme of blog posts not getting proper academic recognition.
I am the author of the original blog post that the retracted paper was based on. I’m relieved that this is behind me, but I’m not sure that I can agree that any of this is how scientific publishing “should happen.”
I first alerted F1000 to methodological problems in this paper (along with the issue of the similarities to my blog) by email over a year ago, on 2016 May 19. F1000 acknowledged publicly that they were aware of the validity issues in the case that they submitted in advance to the 2016 August COPE Forum, in which they remark, “the student and supervisor and some others who commented publicly have also questioned the scientific validity of the way in which the proof-of-concept was demonstrated in the article.”
https://publicationethics.org/case/what-extent-plagiarism-demands-retraction-vs-correction
I felt that the similarities between my blog and this paper were enough to warrant retraction, and F1000 did not investigate the validity issues I indicated to them. They considered the case “closed” after issuing a correction for the similarities between my post and Irving and Holden’s paper.
This case vindicates my earlier contention that plagiarism may merit retraction, not just correction. If an author did not come up with an idea, they may not properly execute it. And even if they can, they certainly should not be set up as an authority on that subject. A retraction can protect the scientific literature, and not just serve as “punishment.”
I agree with Knottenbelt’s assessment in his comment that there is very limited insight to Irving and Holden’s paper. The idea merited a short blog post. In 2014.
When the original blog was written, there were no such automated systems for performing Bitcoin timestamping like there are now. Having such a low-level description of a now-standard cryptographic technique in the medical literature was strange.
I am encouraged by the way that the scientific literature has corrected itself after such a disheartening episode.
The work of Daniel Himmelstein in particular was exemplary and should be noted and credited publicly. He independently reviewed this paper on his own initiative, and posted an extremely thorough review on his blog here:
http://blog.dhimmel.com/irreproducible-timestamps/
After Himmelstein’s review was posted as a comment to the paper, F1000 solicited an “additional independent peer reviewer,” namely Knottenbelt, whose review also does not pull any punches.
The necessary analyses performed to investigate the errors are available at Himmelstein’s Github repository, available at the following address:
https://github.com/dhimmel/irreproducible-timestamps
The basic idea is rather old, see the Wikipedia article https://en.wikipedia.org/wiki/Trusted_timestamping. I had a student implementing this same idea in 2014 as a bachelor’s thesis project, he was building on many other ideas that had been published previously in the computer science literature.
Although I’m not a blockchain expert, when Retraction Watch first broke this story I read through the paper and wrote a scathing critique of its quality, and immediately made a note on PubMed Commons:
https://medium.com/@OmnesRes/medical-students-cant-help-but-plagiarize-apparently-f81074824c17
Just to say I don’t think anyone is trying to take away from Daniel’s contributions to this debate, which are a matter of public record and are clearly visible under the comments section of the article on the F1000 site, and are clearly part of the review narrative accompanying the paper. I don’t think anyone could miss them. Indeed, because of the concerns he raised, I was brought in to give my opinion on the article as an independent reviewer and that’s exactly what I have done – my comments are based on my own attempts to recreate the methodology described in the paper, based in turn on my 20+ years of experience as an academic computer scientist, and my two years of experience as Director of the Imperial College Centre for Cryptocurrency Research and Engineering. For the avoidance of doubt, I don’t think blockchain-related domain expertise is exclusively housed in any one individual and I absolutely don’t seek to take the credit for this retraction; I simply partook in the peer review process along with others, including Daniel.
Thanks William for your comment.
I should note that I did appreciate your review. Of the four invited reviews, yours is the only one to critically evaluate the study. Additionally, you succinctly describe the major flaws, namely the random address generation, the poor hashing practices, the reliance on trusted third parties, and the triviality. I also appreciated your link to Let’s Talk Bitcoin Episode 65 (I was an editor for LTB in 2014).
Open peer review is a recent development and nascent practice, at least in the biomedical fields. Irving & Holden’s study highlights some interesting consequences of attributed and public reviews.
First, reviewers are more accountable for their reviews. The insufficient nature of the initial three reviews is now readily apparent. Hopefully, F1000Research will learn from this incident to strengthen their reviewer selection against “peer-review rings” and related weaknesses.
Second, reviews are becoming part of the permanent scientific record, especially since F1000Research and other diligent journals assign DOIs to reviews. The question then becomes to what extent reviews are distinct scientific works that in addition to evaluating the underlying manuscript should also provide sufficient referencing to existing works. I think this is an open question, and William’s review was an overwhelming positive contribution, although I wish it had explicitly placed itself in the context of my previous comments.
If we want post-publication peer review to achieve its full potential, then incentives of academic credit will be essential. In my case, Irving & Holden’s study was not directly related to my research. But when I noticed its irreproducibility and backstory, I decided to take the time to investigate and chronicle its flaws. It’s worth noting that Irving and Holden’s updates in version 3, which were induced by my earlier blog post, made the incorrect address generation more obvious. How to exactly quantify contributions in the fluid and scattered landscape of post-publication peer review is of course difficult.
So in conclusion, I’m happy this study has finally been retracted. I’m happy that William was able to lend his blockchain expertise to finally make the retraction happen. And I’m glad that we’re touching on the difficult issues of open review as well as post-publication peer review. Despite these difficulties, I’m extremely encouraged by the prospect of scientific evaluation and discourse becoming more open. It’s important to consider the alternative. Had we followed the traditional practice of closed review and communications with journal editors, it’s likely that the many individuals who worked hard to re-examine and expose this study would have received little to no recognition.
I’ve just read Jordan’s comment, and his Medium Post and it is spot on as well. So that too should be acknowledged as a contribution to the review narrative.