So how often does medical consensus turn out to be wrong?

In a quote that has become part of medical school orientations everywhere, David Sackett, often referred to as the “father of evidence-based medicine,” once famously said:

Half of what you’ll learn in medical school will be shown to be either dead wrong or out of date within five years of your graduation; the trouble is that nobody can tell you which half–so the most important thing to learn is how to learn on your own.

Sackett, we are fairly sure, was making an intentionally wild estimate when he said “half.” [See note about these strikethroughs at bottom of post.]  But aA fascinating study out today in the Archives of Internal Medicine gives a clue as to the real figuresuggests that he may have been closer than any of us imagined.

The authors of the study, from Northwestern and the University of Chicago, looked at a year’s worth of studies in the New England Journal of Medicine (NEJM) that “made some claim with respect to a medical practice.” There were 124 such studies, and 16 — that’s 13 percent, or about one in 8 — constituted reversals. As the authors note:

Reversals included medical therapies (prednisone use among preschool-aged children with viral wheezing, tight glycemic control in intensive care unit patients, and the routine use of statins in hemodialysis patients), invasive procedures (endoscopic vein harvesting for coronary artery bypass graft surgery and percutaneous coronary intervention for chronic total artery occlusions and atherosclerotic renal artery disease), and screening tests. In several cases, current guidelines were contradicted by the study in question, as indicated in the third column of the eFigure.

So a quick back-of-the-envelope calculation suggests that at 13 percent reversals per year during four years of medical school, for 52 percent, Sackett may have been remarkably close to reality. Then again, that makes a lot of assumptions, including ones about how representative this sample is, which we’ll get to in a moment.

Still, as long as that one in eight figure is in the ballpark — and there’s reason to think it is — tThe results mean that a good chunk of current medical practice is likely to be reversed over time. And although that doesn’t mean all of those original papers should be retracted, you can see our obvious interest in this.

We asked study co-author Vinay Prasad, by email, to elaborate on the findings. First, what made the team decide to do this analysis:

For a long time, we were interested by what we believe to be a pervasive problem in modern medicine.  Namely, the spread of new technologies and therapies without clear evidence that they work, which are later (and often after considerable delay) followed by contradictions, which, in turn, after yet another delay, is followed by changes in practice and reimbursement.

One might contend that if the cardinal principle of medicine is ‘first do no harm’– reversal violates this.  First, there is harm to the patients who underwent the therapy during the years it fell in favor, and second, the harm to future patients until there is a change in practice. And lastly, there are the diffuse harms, such as loss of trust in the medical system.  The USPSTF’s change of mammography for 40 year old women had painful repercussions, and was based in large part on the Lancet reversal in 2006.

Prasad noted that there had been an earlier study in JAMA of contradictions in medical research by John Ioannidis, whose work on the shoddy state of clinical evidence has been getting more attention lately:

Ioannidis shows that among highly cited research 16% of findings are later contradicted. There were two limits to Ioannidis paper, however, despite its merits.  The first is that high citation papers may overrepresent controversy, as controversial topics draw further citations and discussion.  Although, to be fair, this is likely a limit to our paper.

The second reason is that we were specifically interested in what percentage of standard of care is ultimately mistaken, which is different than what percent of high citation papers are later contradicted.  The former represents what doctors actually do, while high citation papers may not necessarily reflect clinical practice.

Naturally, we wondered whether the results could be generalized to journals other than the NEJM:

There are reasons to believe we are over and underrepresenting reversal.  So the answer is we don’t know.

Overrepresent: The NEJM probably gets more reversals than other journals because good reversals are large randomized trials, which are highly coveted by high impact journals (Lancet, NEJM, JAMA).  So reversal may not be the same for other journals, particularly lesser ones.

Underrepresent: However, on the other hand,  for all the possible testable questions (that arise from current clinical practice) only a fraction are being tested at any given time.  The bulk of NEJM is evaluating new therapies (72%, in our paper) as opposed to established ones.  There are likely more reversals out there.  Further work is warranted, and we have some ideas on how to extend our analysis.

So how long do these kinds of reversals typically take?

So, no one has looked at how long practices survive before they are ‘reversed.’  But, I would argue that it has changed over the years, and accelerated since the [Cardiac Arrhythmia Suppression Trial] CAST trial in the early 1990s.  Reversal probably happens faster now (though still pretty slow).  The Nesiritide study in this week’s NEJM has a nice editorial by Eric Topol, who talks about a ‘lost decade,’ i.e.  how long nesiritide was used before being contradicted. But, examples like routine use of the pulmonary artery catheter (one of the early reversals) took decades before a solid reversal (ESCAPE trial in 2005).

And do these reversals have an impact? If so, how long do they tend to take to change practice?

Yes they have an impact, but only after considerable delay.  One person has studied how long after ‘reversal’ before the medical community accepts the contradiction.  John Ioannidis published a paper called Persistence of Contradicted Claims in the Literature. In this paper, Ioannidis looked at claims that had been disproven in the medical literature, he found for one notable example that, “a decade had passed from the contradiction of its effectiveness, [before ] counterarguments were uncommon.”  My guess is a decade is about right.

Prasad concluded:

We want to stress the implications of reversal.  Reversal implies harm which is multifaceted and enduring.

That squares with a recent report that retractions are linked to patient harm. It doesn’t mean doctors who use evidence that is later shown to be wrong have bad intentions. (In fact, as long as they’re using evidence, they’re ahead of some physicians.) But it does provide yet another reason to read the Archives of Internal Medicine’s “Less Is More” series, which is where this study appeared. It’s some of the most consistently skeptical and evidence-based stuff we see anywhere.

Of course, half of it may be wrong within four years…

Correction, 11:45 p.m. Eastern, 7/11/11: Eagle-eyed Retraction Watch reader Dan Fagin noted that our back-of-the-envelope calculations trying to link this to Sackett’s “half” of all studies being proven wrong within four years were, well, wrong. As he pointed out, we can’t multiply 13 percent x 4 years and get 52 percent, because the universe of studies has also quadrupled over those four years. Apologies for the error, which is solely our fault and not that of the authors of this important Archives paper.

8 thoughts on “So how often does medical consensus turn out to be wrong?”

  1. Not sure about this. The fact that one trial contradicts the consensus doesn’t mean the consensus, which was (hopefully) built on the back of many trials, is wrong. The trial might be wrong.

    For example the medical consensus is that homeopathy doesn’t work and that it should not be prescribed.

    If this were wrong, it would be doing great harm, because homeopathy is a cheap and safe intervention.

    However, the fact that the occasional study comes out claiming that homeopathy does work, doesn’t shake the consensus.

    It would be better to look at the guidelines themselves (or even at actual practice though that’s harder to measure), find out how often they change, and then look back and ask how early they could have changed, based on the literature available e.g. to work out the lag between the studies that led to the change coming out, and the change being made.

    1. Specifically examining the 16 trials we call reversal would make it hard to hold this position.

      In each case we discuss, the original ‘consensus’ was built on data far more limited than the later, NEJM trial. Typically the trials in the NEJM are larger (often than all predicessors combined), better controlled, more carefully designed, and sometimes all three, and, accordingly, are more likely to approximate the ‘true’ effect of the therapies in question.

  2. 16% of highly cited papers being reversed means that 84% of highly cited papers are not reversed. 1 out 8 clinical practice findings being reversed means that 7 out of 8 are not reversed.

    I don’t have an understanding of what is the threshold for reversed findings, but those numbers are pretty positive for the medical research community.

    I agree that all initial trials need to be stronger — better controls, proper outcomes, high statistical power — so that the initial consensus data that introduces a change to clinical practice has a strong foundation.

    And though 10 years is a long time for a change to happen, to get useful clinical data usually requires 2-5 years of follow-up depending on the immediacy of a treatment. Repeating a study 2-3 times to confirm a finding and then having these bulletproof findings disseminate into a community that acts defensively in the light of mistakes in practice (doctors) and one can see why it takes decades to change clinical practice.

  3. I specifically remember being told this on our first day in medical school. Later I found out that even when evidence is produced refuting the efficacy of a treatment (or diagnostic method) it is sometimes ignored, even for a century. I refer to my discovery, in 1983, of a book published in 1899 (or thereabouts) on the use of Xrays (that had just been discovered.) The Xrays of the chest done in a small but decisive study in this book showed that the method of physical diagnosis of heart size was completely useless. It’s not possible to determine heart size by percussion (an otherwise valuable technique)…. but we were still taught how to do percussion of heart size in physical diagnosis class in medical school (I won’t say when because then you’ll guess how old I am.)

  4. The findings of the article were from a broad base of data, and ‘the’ issue Do No Harm; therefore, the possible 80% cited as likely positive results convolutes the assertion as material harm to patients, unwanted consequences, trumps the asserted benefits. Further there is no excuse, absolutely no excuse in delaying changes to accepted protocol when they are found to be causative of harm to patients.

  5. Sorry for coming late to the comment party, but there’s an extreme disconnect between Sackett’s claim and the subject matter of the Archives paper. Sackett made a claim about the proportion of “what you’ll learn in medical school” that will later turn out to be wrong. Unless one wanted to claim that the subject matter of the 124 NEJM studies that “made some claim with respect to a medical practice” constituted a random sample of things that one learns in medical school (and I’m sure one wouldn’t), there’s no relationship between the percentage of the 124 studies that were unfavorable findings and the percentage of a medical school education that is actually in error.

    In fact, isn’t it the case that the 124 NEJM studies cited were specifically evaluating medical claims that were most likely to be false a priori? I don’t think, for instance, that the NEJM is likely to publish a point-counterpoint about whether the heart helps circulate blood throughout the body.

    Sackett’s claim was obviously hyperbolic and intended rhetorically, and I assume any discussion of it to be largely tongue-in-cheek. But I still think it’s misleading to draw any connection between it and the Archives paper, especially in a post where you originally went so far as to do some faulty arithmetic to arrive at the desired 50%.

  6. The provenance of the opening quotation is questionable: I first heard it attributed to Harvard Dean Burwell in a talk he gave to incoming medical students roughly 70 years ago. The point is well taken, no matter who made it first. It wouldn’t surprise me to learn it was Hippocrates.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.