Retraction Watch

Tracking retractions as a window into the scientific process

Weekend reads: Publishing’s day of reckoning; an Impact Factor discount — on lunch; a prize for negative results

with 2 comments

The week at Retraction Watch featured mass resignations from a journal’s editorial board, software that writes papers for you, and a retracted retraction. Here’s what was happening elsewhere:

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here. If you have comments or feedback, you can reach us at

Written by Ivan Oransky

November 11th, 2017 at 9:00 am

Posted in weekend reads

  • Jonathan L. Seagull November 12, 2017 at 6:23 am

    Ask a thousand academics: the only people who dislike Sci-Hub are rent-seekers. It is interesting to see how far the US courts will be willing to go to support rent-seeking.

  • Steven McKinney November 14, 2017 at 9:08 pm

    Frank Harrell’s comments about the ORBITA blinded placebo controlled clinical trial of heart stents are most unfortunate indeed. I continue to lament that a statistician I have long trusted now rails against reasonable statistical studies, bashing p-values inappropriately, and harping that only Bayesian methods can save us.

    The high profile paper published in the Lancet (the ORBITA trial) shows a very good understanding of statistical issues. The authors recognized that no truly blinded study of this medical manoeuvre had ever been done. Years of anecdotal publications litter the literature, enough to convince many who do not understand statistical issues that this trial would be unethical. What is unethical is to continue to promote ill-founded medical manoeuvres based on poorly done studies.

    These authors worked hard to convince others of the errors in their thinking, and arranged for a proper blinded clinical trial. They registered their trial plan beforehand, with and pre-published with the Lancet. They identified an outcome of no medical relevance (30 second difference between the two groups) and performed a power calculation using then-available data which showed that 100 cases per treatment group would provide 80% power to detect such a difference.

    The authors, reviewers, and editor did not allow a classic absence of evidence is not evidence of absence error to be made, as Harrell states in an update to a previous blog entry of his (“Statistical Errors in the Medical Literature”, first published April 8 and updated November 4, 2017). In the presence of a power analysis showing adequate sample size to detect any difference larger than that of no medical relevance, a large p-value does provide sound statistical evidence that a difference of medical relevance is likely not present, i.e. that the null hypothesis is the relevant hypothesis to accept, at the stated type II error rate. This study measured a difference of 16.6 seconds between the two groups, well below the minimum difference of medical relevance that they specified a-priori. If a difference of more than 30 seconds had been the true state of affairs, 4 out of 5 such studies would have detected the difference. This study did not yield such a measurement, so their sound statistical conclusion is entirely valid: these data support the null hypothesis at the stated type II error rate.

    Harrell has become fond of picking the end-point of a confidence interval, and saying “see, the difference could be this big, so accepting the null hypothesis is bogus, and they should have done a Bayesian analysis”. Harrell declares this clinical trial to be “small”. The authors’ power analysis showed that 200 cases would be adequate to detect a difference of 30 seconds or more in 80% of trial attempts (type II error rate of 20%). So in what sense is this trial small? It includes the requisite number of cases indicated via a proper power analysis. Testing too few cases is a waste of resources, as in that case a large p-value can not be interpreted as allowing the null hypothesis to be accepted. With too few cases, a large p-value yields no interpretable result, as the type II error rate is either unknown or is unacceptably large. With many more cases than indicated by a power analysis, the trial risks denying an effective treatment to the control group should the treatment show a medically relevant improvement. This is the whole point of doing a-priori power analyses, to ensure that neither too few nor too many cases are enrolled in the trial. Both of those scenarios have ethical problems.

    This weekend I will host a visiting friend who is recovering from a stroke induced by a portion of a stent breaking away and lodging in his brain. My friend will travel by train, no longer being able to drive due to probably permanent damage to a portion of the visual portion of the brain necessary to process important driving-related visual cues. Placing a stent is not a benign manoeuvre, and if people are going to suffer consequences such as my friend experienced, there should be solid statistical evidence that the manoeuvre can provide substantial medical benefit, enough to outweigh the harms that the manoeuvre can also induce. Unfortunate comments such as these of Harrell are not helping to clarify these issues to those not well versed in statistical methods.

  • Post a comment

    Threaded commenting powered by interconnect/it code.