Collateral damage: Paper — and editorial, and author’s response — retracted in one fell swoop

euro jA journal has retracted the results of a clinical trial comparing strategies for bladder tumors after the authors mischaracterized the way patients were assigned to each procedure.

In addition, the journal European Urology has pulled a string of correspondence between author Harry Herr at Memorial Sloan Kettering Cancer Center and an outside expert, who had questioned aspects of the study totally unrelated to the methodology, such as its generalizability.

Here’s the retraction notice for ” Randomized Trial of Narrow-band Versus White-light Cystoscopy for Restaging (Second-look) Transurethral Resection of Bladder Tumors:” 

The author of this paper has requested that it be retracted because of an incorrect representation of the study methodology. Contrary to the manuscript contents, patients were randomly assigned to narrow band or white light cystoscopy in the order of their attendance, rather than in a permuted block allocation. Patients were individually consented for the clinical procedures, but not to a prospective randomized clinical trial where randomization occurred by permuted block. An appropriate waiver was obtained from the Institutional Review Board to analyze and publish anonymized data from the clinical database.

The 2015 paper has yet to be cited, according to Thomson Reuters Web of Science.

When the paper was published, it was accompanied by an editorial by Peter Black, a researcher listed at the University of British Columbia in Vancouver. The editorial (“Narrowing the Cystoscopy Gap”) didn’t appear to take issue with the study methodology — noting “randomization was perfect” — but instead questioned the extent of the study’s generalizability.

Black also criticized the trial for being underpowered, something the original paper also pointed out by failing to achieve a statistically meaningful result.

In the same issue, the journal published Herr’s response to Black’s editorial, entitled “A Better Transurethral Resection—Proved or Not!.”

Since the journal retracted the original paper, it decided to pull the correspondence related to it, as well. Here’s the retraction notice for Black’s editorial:

This editorial has been retracted at the request of the Editor-in-Chief and the Author following the retraction of the original article (Eur Urol 67 (2015) 605, http://dx.doi.org/10.1016/j.eururo.2014.06.049) to which it refers. The author of this editorial was not involved in the original study referred to.

And here’s the retraction notice for Herr’s response:

This reply has been retracted at the request of the Editor-in-Chief and the Author following the retraction of the original article (Eur Urol 67 (2015) 605, http://dx.doi.org/10.1016/j.eururo.2014.06.049).

When we emailed Black about why his editorial was retracted, he said it was standard procedure:

My editorial comment was retracted simply because an editorial comment linked to a retracted article is generally also retracted. It has nothing to do with the editorial itself, but rather the retracted manuscript.

We also emailed Herr and the journal but haven’t heard back.

Hat tip: Tansu Kucukoncu

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our new daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.

6 thoughts on “Collateral damage: Paper — and editorial, and author’s response — retracted in one fell swoop”

  1. “Black also criticized the trial for being underpowered, something the original paper also pointed out by failing to achieve a statistically meaningful result.” Is the failure to obtain a statistically meaningful result proof of an underpowered study? Should studies and trials continue until a “statistically” meaningful result is obtained? How would one know that in advance.

  2. The failure to obtain a statistically meaningful result is not necessarily proof of an underpowered study. When measured effect sizes are small, a statistically meaningful result, as opposed to a statistically significant result, requires an a-priori power calculation before a declaration of “no difference” can be supported by a test result yielding a large p-value. A meaningful power calculation requires the identification of the magnitude of the measured effect that is scientifically or medically or biologically relevant.

    Figuring out how large of an effect is relevant is generally not a simple exercise, but it is rarely impossible to do. Sadly, researchers rarely make the effort to do so and thus frequently (and erroneously) declare “no difference” or “no measurable effect” when their small sample test yields a large p-value.

    A properly conducted statistical evaluation of a scientific question requires identifying what measured effect indicates no meaningful effect size (often expressed as the null hypothesis) and what measured effect indicates a meaningful effect size (often expressed in some form as the alternative hypothesis). Enough data should be collected so that the statistical test will reject the null hypothesis with high frequency (i.e. in the vast majority of replications of the study) should an effect size as large or larger than the size of scientific relevance be present. Ensuring that enough data was collected is the whole point of an a-priori power calculation, sadly rarely done save for well run clinical trials.

    Collecting a huge amount of data, then crowing about all statistically significant differences, can also yield fallacious conclusions, when the measured effect size is considerably smaller than any effect size of scientific relevance. Such statistically significant findings are not necessarily statistically or scientifically meaningful.

    Fallacious conclusions result when a statistically significant difference is achieved and touted but no consideration of a scientifically relevant effect size is discussed, or when no statistically significant difference is found, and this is touted as proof of no difference without any a-priori power calculation and its attendant discussion of scientifically meaningful effect size. Statistically meaningful conclusions require understanding effect sizes of relevance, and the amount of data needed to reliably detect such differences in repeated rounds of experiments.

  3. For this study, the author of this retracted paper stated that a “sample size of 250 patients was planned to provide 90% power (with two-sided type I error of 5%) to detect a clinically meaningful difference of 20% in tumor-free recurrence rates between the two groups, assuming equal distribution of patients and allowing 5% loss to follow-up.”

    Thus the claim is made that a reasonable a-priori power calculation was performed, with a medically relevant difference of interest identified. To then state that the study was underpowered because the desired outcome wasn’t realized is statistically fallacious. The proper conclusion (had the randomization been properly done) would have been that there is no difference of medical relevance between the two imaging modalities, because a proper a-priori power analysis was done, and a non-significant result obtained.

  4. I’m having trouble with this bit:
    “Contrary to the manuscript contents, patients were randomly assigned to narrow band or white light cystoscopy in the order of their attendance, rather than in a permuted block allocation. Patients were individually consented for the clinical procedures, but not to a prospective randomized clinical trial where randomization occurred by permuted block. An appropriate waiver was obtained from the Institutional Review Board to analyze and publish anonymized data from the clinical database.”

    1. Being randomized to 1 of 2 treatments is not clinical care – it’s research.

    2. I would like to know more about the process of informed consent. Was there a written consent form? From the bit quoted above it seems as if patients may have been, um … confused into believing randomization to 1 of 2 treatments was the most appropriate treatment for them in their individual circumstances. It’s difficult to understand how this could be possible. The technical term for this “bad.”

    3. I’m confused about the “appropriate waiver” allegedly obtained from the IRB. IRBs can approve, decline to approve or require changes of proposed research as condition of approval. IRBs can also determine whether proposed activity requires IRB review.

    From a distance it seems as if the IRB either approved or exempted (from the requirement for IRB review) publication of de-identified data that was in existence at the time. Does it follow that there was no IRB review prior to randomizing patients to 1 of 2 treatments? It’s hard to believe the IRB would approve publication of data from research it never approved.

    254 patients with high-risk bladder cancer were randomized to 1 of 2 treatments – without IRB review and approval? At a leading institution like Sloan Kettering? Really?

  5. One point that hasn’t been mentioned is that sometimes the sample size calculations are wrong because some of the assumptions are incorrect. For example there is always a need for an estimate of the proportion of failures in one group. This doesn’t seem to have happened here.

    What usually happens is that someone starts off by deciding that a 10% difference is the minimum that is meaningful. Then the sample size calculations are done and they need say 1000 patients and they don’t have the time or money for the study. So they change their mind and do the sample size based on a minimum difference of 20%. Then they do the study and find a difference of 10% and as expected it is not significant. They then try and put a positive spin on it, one way is what they have done here.

    One point about this is that MSKCC has a very good biostatistics group. I don’t know how funding works in America but I expect that at the least he could have talked to someone about what he should be doing. What he has is observational data and that requires more complex methods. If they turned out a strong result then it would be strong justification for a clinical trial and funding. This would involve a statistician who would be able to look at ways of improving the power. Larger sample size is one, longer followup would also work and better statistical methods would also.

Leave a Reply

Your email address will not be published. Required fields are marked *