Retractions aren’t enough: Why science has bigger problems

Andrew Gelman
Andrew Gelman

Scientific fraud isn’t what keeps Andrew Gelman, a professor of statistics at Columbia University in New York, up at night. Rather, it’s the sheer number of unreliable studies — uncorrected, unretracted — that have littered the literature. He tells us more, below.

Whatever the vast majority of retractions are, they’re a tiny fraction of the number of papers that are just wrong — by which I mean they present no good empirical evidence for their claims.

I’ve personally had to correct two of my published articles.  In one paper, we claimed to prove a theorem which was in fact not true, as I learned several years after publication when someone mailed me a counterexample.  In the other paper, we had miscoded a key variable in our data, making all our empirical results meaningless; this one I learned about when a later collaborator was trying to replicate and extend that work.  So my own rate of retractions or corrections is something like 0.5%.

I’m a pretty careful researcher, and I can only assume that the overall rate of published papers with fatal errors is greater than my own rate of (at least) half a percent.  Indeed, in some journals in recent years, I think the error rate may very well approach 50% — by which I mean that I think something like half the papers claim evidence that they don’t really have.

In recent years we’ve seen various high-profile cases in social science of published research with “statistically significant” findings which were fatally flawed, where the data could not support the elaborate claims being made.  High-profile examples include claims of ESP among some college students, correlations of ovulation and voting, adopting a certain pose (“power pose”) makes you more powerful, beautiful parents are more likely to have girls than boys, and people react differently to hurricanes tagged with a male versus female name. In each of these cases, the issue is not that the underlying scientific claims are necessarily false– ESP might well exist, people do behave differently at different times of the month, etc. — but that the claimed evidence just wasn’t there to support the claims.  To put it another way, in any of these examples, the data would also support the exact opposite of the claimed phenomena (college students having a negative ESP, power posing has a negative effect, and so forth). And it’s not just frivolity.  Similar statistical concerns arose, for example, in a much-publicized study of the effect of air pollution on life expectancy.

The above-mentioned studies have four features in common:

  1.  They are all (in my opinion) fatally flawed from a statistical perspective.
  1.  Nobody (including me) is suggesting that the data in any of these studies was faked.
  1.  None of these papers has been retracted or corrected.
  1.  None of these papers is ever going to be retracted or corrected.

Readers may balk at that last assertion. How am I so sure the papers won’t be corrected or retracted? Because, with extremely rare exceptions, bad statistics is not enough of a reason for anyone to fix the scientific record.  At most–and this too is very rare–the journal might publish a letter refuting the published paper, or an article explaining the error might be published elsewhere.  But not usually even that.

I’m not saying that all these papers should be retracted; rather, I’m saying that retraction/correction will only ever be a tiny part of the story.  Just look at the numbers.  Millions of scientific papers are published each year.  If 1% are fatally flawed, that’s thousands of corrections to be made.  And that’s not gonna happen.  As has been discussed over and over on Retraction Watch, even when papers with blatant scientific misconduct are retracted, this typically requires a big struggle in each case.  The resources just aren’t there to adjudicate this for the thousands of published papers a year which are wrong but which don’t involve misconduct.

Indeed, it seems that retractions and corrections are not so much about correcting the scientific record as about punishing wrongdoers and shoring up the reputation of journals with regard to their most embarrassing mistakes.  That’s fine — I agree that crime shouldn’t pay and that journals have every right (and even an obligation) to de-associate from cases of fraud.

My point here is that we shouldn’t think of retraction and correction as any kind of general resolution to the problem of published errors.  The scale is just all wrong, with tens of thousands of papers that are wrong in their empirical content, and orders of magnitude fewer papers being corrected or retracted.

In an article discussed recently on Retraction Watch, Daniele Fanelli recommended that authors be able to self-retract articles that are fatally flawed due to honest errors.  I think that’s fine — as noted above, on the two instances when it turned out my own articles were fatally flawed, I contacted the journals and they ran corrections–but I think that when it comes to cleaning up the scientific literature and flagging errors, retraction won’t be the most useful tool.  Now and for the foreseeable future, it looks to me like retraction will be a rarely used tool, used mostly for flagging fraud.  To tag the sorts of run-of-the-mill errors which plague science and cause some thousands of erroneous papers to be published each year, we’ll need some more scalable versions of post-publication review.

Gelman blogs at http://andrewgelman.com/.

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our new daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.

30 thoughts on “Retractions aren’t enough: Why science has bigger problems”

  1. Is there perhaps an error (a missing “not”) in the sentence “I’m saying that all these papers should be retracted;”

  2. Scientific truth is an emergent property…it emerges from a body of literature over many years, sometimes decades. No single paper is ever perfect, and we shouldn’t expect them to be. All we can ask is that 1) the researchers have acted in good faith, 2) their results can be subjected to replication, and 3) that reviewers will put their personal agendas (political, social, and/or career-oriented) aside when they adjudicate papers (I think we call that professionalism). The notion that new public policies should spring immediately from the latest study is both wrong-headed and dangerous…scientific truth is a collective creation.

    1. 1000% agreed !!! Let me just complete your last sentence:
      “scientific truth is a collective creation essentially based on data sharing and exchange of ideas.”

    2. We don’t really know whether that is so under current conditions. Now that being a scientist is just another profession, there may too much noise in the system. A million mediocrities interspersed with a thousand outstanding individuals will likely produce something quite different than the thousand would working on their own.

    3. i agree. on the other hand, once it is shown, that the paper is erroneous/unreliable not due to contemporary state of a knowledge, but because of errors clearly recognisable in the time of publication (e.g. due to imperfect review), those should be corrected and to refuse to do it should be found unethical.

      unfortunately, twice i tried to get an errata from authors/publishers, twice i failed, because they simply ignored the request:
      1) DOI: 10.1002/rcm.4453
      2) https://pubpeer.com/publications/F936AF9AC28DBF70D57BE9D41BDE43

    4. I would paraphrase you as saying: “science is the product of a large sample size and errors here and there are acceptable even to the extent that they may completely invalidate a particular publication.” I disagree with that. I think more must be done to correct egregious errors. I do not know how but this significantly lowers my perception of scientific endeavor.

      I should qualify by stating my consideration specifically refers to peer-reviewed publications. I regard publications without such review as lower quality for the very reason being discussed here.

  3. Thanks for a fascinating article. The 50% number is of course just a hypothesis, for you have no evidence of a 50% error rate. If the academic error rate is 50%, the journalism error rate must be higher–I’ve always been under the impression that newspapers have lower R squareds than academic journals. If not, then universities may need to do some hard thinking as to their incentive structures, ethical climates, and value systems. If publish or perish has resulted in a 50% error rate, then it s a failed system. However, as I say, you have no real evidence of a 50% rate.

    1. In fact several studies in the biomed literature put the error rate in this league if perhaps a little lower.

  4. The problem is that policy – speaking here of Health Care specifically – is being explicitly set on the basis of the “Evidence” obtained/published/taught by a myriad of research tools and methods – with the type of result issues Dr. Gelman discusses – and then it is set in stone (“settled science” as an oxymoron of the first order…). Not good.

  5. Is ESP = Elementary Statistics Problem?
    Neither the linked article says. Acronyms are a worse problem than statistics (which can be detected if wrong).

  6. This piece raises an important question; what is the purpose of a retraction? Is the purpose, as Dr. Gelman says, “correcting the scientific record”? What exactly is meant by the “scientific record” anyway? Is that record a blemish-free recounting of one success after another after another, with never a failure or false trail? Or does the record admit a possibility that science is not linear, that we spend a good deal of time stumbling around looking for a way forward?

    To me, a journal article is nothing like the last word. It is merely a signpost along the way, signifying that a reasonable number of miles have been traveled since the last signpost was erected. I don’t expect any paper to be perfect; a good paper often raises more questions than it answers. Some imperfections are more egregious than others and the worst probably should be retracted. But Dr Gelman is wrong to think that, “retraction will be a rarely used tool, used mostly for flagging fraud.” Fraud is the most common reason for retraction, which makes it all the more tragic that some researchers have a retraction on their record because a journal screwed up and published the same paper twice.

    Incidentally, Marco is right; there is a missing “not”. Should this post be retracted? Or can we accept the lack of perfection in everything we do?

  7. Personally and professionally, I think full retraction should be reserved for fraud and clear, intentional wrongdoing. The cure for mistakes, both minor and major, is…more research. While intricate replication is pretty rare (at least it is in the social sciences), research articles build upon one another and, over years, correct many mistakes or errors in databases, modeling, and statistical procedure (some debates over technical issues can be quite lively, in fact). Schools of thought are grounded in this style of publishing science.

    Of course, we can opt for aggressive retraction policies, but it might prove very Orwellian…memory holes dotting the scientific terrain and chronic reworking of scientific “history.” I really don’t think that’s the way to go…no one said that science (which IS a social process) is either linear or efficient.

  8. My point here is that we shouldn’t think of retraction and correction as any kind of general resolution to the problem of published errors. The scale is just all wrong, with tens of thousands of papers that are wrong in their empirical content, and orders of magnitude fewer papers being corrected or retracted.

    But wait, this flies in the face of Kent Anderson’s recent declaration that merely reporting on retractions has a “chilling effect” on “a community that is managing itself pretty well“:

    “We want journals to issue corrections, and to retract articles when necessary. But I’ve seen editors second-guess the wisdom of issuing corrections or retractions — because of the unwanted media attention, because aspersions will be cast, because researchers will be subject to lurid speculation. When this is happening, it’s clear that a tool [sic] like Retraction Watch, which is positioned for transparency, is having a chilling effect.”

    Don’t get me started on the “a little tawdry” bit, given the Andraka hit pieces.

  9. I’m about to start reading my month batch of ethics applications focussing on the scientific component. Many of them will be flawed to a certain extent. Some of that will be things of a more ethical nature, for example writing the informed consent in a way that misinterprets the literature, but most will be a deficiency in the study. Main problem will be a lack of sample size and it will be obvious that they don’t have sufficient sample to achieve the aims of their study. Unfortunately poor analysis will overcome that. Neuroscientists generally seems to think that 20 subjects in a study is sufficient, and when you have 40 or 50 outcomes then something is going to have strong evidence of an effect. Often they won’t reveal the techniques that will be used for analysis. This is a bit worrying considering that when they do, it is often wrong.

    They will all be passed by the main ethics committee, because what matters is that the research will be publishable and it doesn’t create any embarrassing ethics problems, not that it is scientifically useful.

  10. ” I think the error rate may very well approach 50%”. Gelman performs the amazing feat of deciding that the rate must be about 50% because his own rate is 0.5%. Thus, he multiplies his error rate by 100 to get the error rate of us dummies and fraudsters. This totally unsupported estimate is especially worrisome because 50% is now being quoted very widely. It gives science of all sorts, not just social science, a very bad image at a time when science is under attack. It is irresponsible to offer that sort of number with no real underlying evidence and to smear all areas of science equally. How do we know that it is really true? How many errors are substantive and how many are trivial? How many errors are obvious to the reader because they are a matter of unsatisfactory sample sizes and thus a matter of judgement? This important topic requires a nuanced analysis, not a decision to multiply by 100.

    1. I don’t think it is fair to blame Gelman for the misrepresentation of his words. He clearly added a qualifier: “in some journals”.

      1. Yes, “in some journals in some years.” I was thinking in particular of Psychological Science around 2013-2014. I have not done a careful survey of the articles–50% was just a guess–but a few times I looked at their lists of just-released papers, and it did seem that about half were in error. I doubt the overall rate of error of published papers is anything near 50%. 10% might be more like it. Still way too many to try to handle using the correction/retraction mechanism.

        1. I agree with Susan. It seems like I am equally justified in saying: in some journals in recent years, I think the error rate nears 100%. Note the components: “some” (which?), “recent” (vague), and “I think” (who doesn’t make mistakes every day?).

          I believe careless speculation should be left out of peer-reviewed publications. This article, however, is more of an editorial nature.

  11. It seems to me that with the growing prevalence of publicly archiving data as a requirement for publication, most problems of ‘fatally flawed’ inferences caused by flawed statistical analyses will be corrected without need of retraction. Someone who detects the flaw should / will simply publish a paper that is a reanalysis of the data, providing a (hopefully!) more sound conclusion. Those papers that are left uncorrected will be the ones that the results are too unimportant and trivial to bother correcting.

    1. Tom: You write, “Someone who detects the flaw should / will simply publish a paper that is a reanalysis of the data, providing a (hopefully!) more sound conclusion.” One reason I support more venues for post-publication review is that it can take a ridiculous amount of effort to publish a correction of someone else’s published paper. You talk about “a reanalysis of the data, providing a (hopefully!) more sound conclusion.” That’s fine, but what if I want to just alert people to fatal flaws in a published paper, _without_ going to the trouble of reanalyzing? That would be valuable too but it doesn’t typically fit into the publication process. Thus readers of the original paper will not have an easy way to find the note of error, future researchers can waste their time chasing noise, etc. People can waste their careers doing follow-up research on topics such as “embodied cognition” or “power pose” based on published papers that were fatally flawed.

      Or maybe not fatally flawed, maybe I’m wrong. But that’s fine. The point is not to expunge these papers from the literature but to make readers aware of their problems.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.