Editor’s note: This guest post by Csaba Szabo is a response to a June 3 post by Mike Rossner on replication studies. We sent a draft to Rossner in advance; find his response below.
The recent guest post on Retraction Watch by Mike Rossner takes a peculiar view of reproducibility. Rossner sets the stage talking about the executive order on “restoring gold standard science” and a call from National Institutes of Health director Jay Bhattacharya for replication studies to determine “which NIH-funded research findings are reliable.” Then Dr. Rossner takes this position: “Conducting systematic replication studies of pre-clinical research is neither an effective nor an efficient strategy to achieve the objective of identifying reliable research.”
If systematic in the above statement means “universal,” then, of course, that is impossible, considering the millions of preclinical papers published every year. If, however, systematic means choosing which studies to replicate and then replicating them, then, this is, indeed possible. And this is exactly what Bhattacharya’s statement calls for: “identification of key scientific claims” that require replication. As explained below, this approach can, indeed, work in an effective and efficient manner, especially if it primarily focuses on new manuscript submissions.
Let’s also remind ourselves: The reproducibility crisis is not a novel concept. In his famous “Roadmap” published in 2014, then-NIH Director Francis Collins laid it out plainly:
Preclinical research, especially work that uses animal models, seems to be the area that is currently most susceptible to reproducibility issues. …. Some irreproducible reports are probably the result of coincidental findings that happen to reach statistical significance, coupled with publication bias. … Still, there remains a troubling frequency of published reports that claim a significant result, but fail to be reproducible.
Nature even reported that NIH officials were “considering a requirement that independent labs validate the results of important preclinical studies as a condition of receiving grant funding.” Thus, replication studies have been frequently discussed and advocated over the last decade. And there was always some pushback, as well. But this time, it feels different to me: today’s pushback may be less about the substance of the proposals and more about political dislike of those proposing them.
Dr. Rossner underestimates how serious the reproducibility problem truly is. He begins with a “folklore number” of 50% irreproducibility, jumps to Bayer and Amgen’s infamous reports claiming 66% and 89% irreproducibility, then circles back to 50% based on a more recent study.
The Bayer number, however, isn’t 66%; it’s much worse. Researchers at the company found that in only 7% of replication studies were the main datasets reproducible, and in just 4% was some of the data reproducible. The article says that in “~20–25% of the projects were the relevant published data completely in line with our in-house findings.”
Meanwhile, the Errington study Dr. Rossner cites found that for the “positive findings” in original papers, 60% of them could not be reproduced. A recently completed replication study in Brazil produced a 21% rate of replication. And yet, Dr. Rossner puts the word “crisis” in scare quotes — as though the problem might not be real — and suggests irreproducibility is just part of science being a “self-correcting” enterprise. That might have flown in the 1980s, but today, it sounds outdated.
In the next part of his essay, he outlines his arguments for why replication studies — especially NIH-funded ones — are not the answer to the replication crisis.
Argument #1: It’s Not Feasible
Dr. Rossner points out that over a million scientific articles are published annually. But again, nobody is proposing to replicate all of them. The idea is to identify high-impact, influential claims and replicate those.
Argument #2: Who Will Select What to Replicate?
Dr. Rossner notes that “someone who does careless work is not likely to volunteer to have it replicated.” No kidding. But that is not an argument against replication studies. That would be like arguing we shouldn’t arrest criminals because they won’t cooperate with the prosecution.
We know that many authors resist replication. Errington’s Reproducibility Project in cancer biology and the Brazilian replication project both showed how uncooperative the original authors can be. But that’s just another reason to advocate replication, not to abandon it. That’s why I’ve proposed in my book so-called “replication supplements”: a system where submitted manuscripts to top journals would include a key finding replicated independently, blinded and randomized, by a separate laboratory. The replication would be appended to the main article.
Journals –– perhaps starting with the big ones, Nature, Science, Cell, PNAS –– could easily adopt this any day. But, once again, there is pushback. Here are some of the objections I’ve heard so far:
- “It would delay publication.” But not necessarily — replication can be built into the project timeline.
- “It would cost too much.” But not really. A flagship paper can easily cost $500,000–$1 million. Spending $40,000–60,000 for third-party validation of a selected key experiment is reasonable.
- “There is no incentive for the replicator group.” It depends. A replicator CRO would get paid, a replicator academic group could get authorship.
- “All journals would have to do it together.” Why? One brave journal could start and lead.
My suggestion is that such a supplement shouldn’t be mandatory, but incentivized with guaranteed peer review. With desk rejection rates north of 90% in top journals, that alone would flip the incentive structure.
Argument #3: What Counts as Replication?
Dr. Rossner is concerned that replication must be defined carefully, and worries that replicators might lack the technical expertise. I agree that “rerunning every assay” or checking every paper is impossible. But we can focus on some key experiments and we can start to be proactive about replication. In my view, in 90% of the papers that describe lab-based preclinical studies, one can easily identify a single, clean, replication-worthy key experiment. In some specialized fields it might be challenging to find a replicator lab. Challenging, but not impossible. But in cell biology, or preclinical pharmacology, the key experiments have standard methods and could easily be set up and used for replication. And in cases where direct replication is impractical — say, in large clinical trials — one could try an independent statistical reanalysis of the data.
Dr. Rossner’s article’s final section offers some well-trodden alternatives to direct replication: better training, more rigorous design, reagent authentication. These are all fine, and they’ve been discussed ad nauseam, in Collins’ Roadmap and before. But experience tells us they’re not enough. He also suggests catching bad papers through image forensics. Sure — but by that point, the damage is done. At best, forensic analysis filters out a fraction of bad submissions at the editorial stage and protects some journals –– those that are willing to dig into their profits and spend extra money on fraud detection. Also, in the end, those flawed papers will simply find another journal to publish them.
Which is why I don’t see any alternative to finally putting direct replication studies on the table. What is the use of a literature where the majority of the published studies are irreproducible? Who does that serve? How can anyone build on it? It still boggles the mind that anyone in the scientific community would take the time to argue against replication.
- In an ideal world, scientists, as a matter of routine, would design and conduct replication studies to verify their own findings. They should be adamant that some of the funding they have should be spent on checking and replicating their data –– well before writing up the findings for publication.
- In an ideal world, institutions would treat reproducibility as essential to their reputation. Because how could they patent and translate findings that do not stand up to validation? And which investigator would be crazy enough to go and work in an institution that is known to churn out unreproducible papers? And what founding agency would give grants to an institution with a questionable reputation? (Remember, I am still talking about ideal world here.)
- Taxpayers demand validated, translatable science. So, yes, the NIH should start spending some of its money (to be exact, the American taxpayers’ money) to replication and validation. Either by paying grantees to do it, or by conducting some of the replication and validation work internally. There is actually an entire NIH institute, called NCATS (National Center for Advancing Translational Sciences), whose scope, in an ideal world, would include this type of activity.
- In a perfect world, scientific journals, should also be adamant about making sure that the material submitted to them is validated, real, and independently repeatable. Why not, then, invest a fraction of their significant profits –– for example, Elsevier’s current profit margin is 40% –– in experimentally vetting some of the work they publish? Either during the peer review process, or even after the paper is out. And –– if the key findings do not reproduce –– they should amend or even retract the paper.
In that sense, Dr. Rossner is half-right: it shouldn’t be the NIH who is paying for replication. Better said, it shouldn’t just be the NIH: institutions and journals should chip in, too. In an ideal world, they would do all of that, on their own, in their own interest, without any outside prodding. Of course, I’m aware that we don’t live in that ideal world. But we should not lose sight of the reason –– the only reason –– why we are all doing biomedical research in the first place. If with some reallocation of funding priorities, shifting of research focus and rethinking the publication process, we can create a scientific literature that is more trustworthy, by all means, we should give it a try.
Csaba Szabo is a professor of pharmacology at the University of Fribourg, Switzerland.
Response from Mike Rossner:
I appreciate Dr. Szabo’s careful reading of my guest post. It seems that he interpreted my post as “against replication,” which was not my intention. I agree with Dr. Szabo that reproducibility is a problem and that replication is part of the solution. I apologize if that was unclear.
I do not, however, think that a systematic, post-publication, replication program is an effective or efficient way of doing replication studies, for the very practical reasons outlined in my original post. Politics had nothing to do with that opinion. I would have opposed such a proposal from any administration.
I do not think that my use of the term “systematic” was unclear; I referred to a selection process several times in the post, and it was thus clear that I was not referring to a “universal” effort to replicate every study post-publication. Dr. Szabo refers to selection processes in his post, but it remains unclear who decides what a “high impact, influential claim” is, or what a “key finding” is, or what a “top journal” is? More importantly, any selection process limits the number of studies replicated, so that a large percentage of published studies remain irreproducible.
Dr. Szabo tries to debunk some of the practical reasons presented in my original post, but he does not address the most critical one—timing: “people will have already spent time figuring out for themselves whatever the findings of the replication initiative later reveal.” In other words, the “organic” post-publication replication system already works in some sense, in that the work that is deemed important enough to build on already gets tested quickly by people vested in the outcome.
In my original post, I stated that I believe researchers could be motivated by various stakeholders to replicate their own studies BEFORE publication. It seems that Dr. Szabo and I agree about the importance of pre-publication replication and the role of various stakeholders in making that happen. Dr. Szabo argues, however, that publishers make so much money that they should pay for replication studies. That’s just not going to happen at any significant scale. It took 20 years to get the large commercial publishers to begin to pay for image screening pre-publication, which has an obvious financial benefit to them of avoiding costly investigations if issues are raised post-publication. There is no obvious financial burden to a publisher if a study cannot be replicated post-publication, because that is not necessarily grounds for an investigation.
Once again, I appreciate Dr. Szabo’s attention to this important topic. There are a lot of issues to consider before taxpayers start spending billions of dollars on post-publication replication studies.
— Mike Rossner
Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on X or Bluesky, like us on Facebook, follow us on LinkedIn, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].