Guest post: In defense of direct replication studies (if they even need defending)

Editor’s note: This guest post by Csaba Szabo is a response to a June 3 post by Mike Rossner on replication studies. We sent a draft to Rossner in advance; find his response below.

The recent guest post on Retraction Watch by Mike Rossner takes a peculiar view of reproducibility. Rossner sets the stage talking about the executive order on “restoring gold standard science” and a call from National Institutes of Health director Jay Bhattacharya for replication studies to determine “which NIH-funded research findings are reliable.” Then Dr. Rossner takes this position: “Conducting systematic replication studies of pre-clinical research is neither an effective nor an efficient strategy to achieve the objective of identifying reliable research.”

If systematic in the above statement means “universal,” then, of course, that is impossible, considering the millions of preclinical papers published every year. If, however, systematic means choosing which studies to replicate and then replicating them, then, this is, indeed possible. And this is exactly what Bhattacharya’s statement calls for: “identification of key scientific claims” that require replication. As explained below, this approach can, indeed, work in an effective and efficient manner, especially if it primarily focuses on new manuscript submissions.

Let’s also remind ourselves: The reproducibility crisis is not a novel concept. In his famous “Roadmap” published in 2014, then-NIH Director Francis Collins laid it out plainly:

Preclinical research, especially work that uses animal models, seems to be the area that is currently most susceptible to reproducibility issues. …. Some irreproducible reports are probably the result of coincidental findings that happen to reach statistical significance, coupled with publication bias. … Still, there remains a troubling frequency of published reports that claim a significant result, but fail to be reproducible.

Nature even reported that NIH officials were “considering a requirement that independent labs validate the results of important preclinical studies as a condition of receiving grant funding.” Thus, replication studies have been frequently discussed and advocated over the last decade. And there was always some pushback, as well. But this time, it feels different to me: today’s pushback may be less about the substance of the proposals and more about political dislike of those proposing them.

Dr. Rossner underestimates how serious the reproducibility problem truly is. He begins with a “folklore number” of 50% irreproducibility, jumps to Bayer and Amgen’s infamous reports claiming 66% and 89% irreproducibility, then circles back to 50% based on a more recent study.

The Bayer number, however, isn’t 66%; it’s much worse. Researchers at the company found that in only 7% of replication studies were the main datasets reproducible, and in just 4% was some of the data reproducible. The article says that in “~20–25% of the projects were the relevant published data completely in line with our in-house findings.”

Meanwhile, the Errington study Dr. Rossner cites found that for the “positive findings” in original papers, 60% of them could not be reproduced. A recently completed replication study in Brazil produced a 21% rate of replication. And yet, Dr. Rossner puts the word “crisis” in scare quotes — as though the problem might not be real — and suggests irreproducibility is just part of science being a “self-correcting” enterprise. That might have flown in the 1980s, but today, it sounds outdated.

In the next part of his essay, he outlines his arguments for why replication studies — especially NIH-funded ones — are not the answer to the replication crisis.

Argument #1: It’s Not Feasible

Dr. Rossner points out that over a million scientific articles are published annually. But again, nobody is proposing to replicate all of them. The idea is to identify high-impact, influential claims and replicate those.

Argument #2: Who Will Select What to Replicate?

Dr. Rossner notes that “someone who does careless work is not likely to volunteer to have it replicated.” No kidding. But that is not an argument against replication studies. That would be like arguing we shouldn’t arrest criminals because they won’t cooperate with the prosecution.

We know that many authors resist replication. Errington’s Reproducibility Project in cancer biology and the Brazilian replication project both showed how uncooperative the original authors can be. But that’s just another reason to advocate replication, not to abandon it. That’s why I’ve proposed in my book so-called “replication supplements”: a system where submitted manuscripts to top journals would include a key finding replicated independently, blinded and randomized, by a separate laboratory. The replication would be appended to the main article.

Journals –– perhaps starting with the big ones, Nature, Science, Cell, PNAS –– could easily adopt this any day. But, once again, there is pushback. Here are some of the objections I’ve heard so far:

“It would delay publication.” But not necessarily — replication can be built into the project timeline.
“It would cost too much.” But not really. A flagship paper can easily cost $500,000–$1 million. Spending $40,000–60,000 for third-party validation of a selected key experiment is reasonable.
“There is no incentive for the replicator group.” It depends. A replicator CRO would get paid, a replicator academic group could get authorship.
“All journals would have to do it together.” Why? One brave journal could start and lead.

My suggestion is that such a supplement shouldn’t be mandatory, but incentivized with guaranteed peer review. With desk rejection rates north of 90% in top journals, that alone would flip the incentive structure.

Argument #3: What Counts as Replication?

Dr. Rossner is concerned that replication must be defined carefully, and worries that replicators might lack the technical expertise. I agree that “rerunning every assay” or checking every paper is impossible. But we can focus on some key experiments and we can start to be proactive about replication. In my view, in 90% of the papers that describe lab-based preclinical studies, one can easily identify a single, clean, replication-worthy key experiment. In some specialized fields it might be challenging to find a replicator lab. Challenging, but not impossible. But in cell biology, or preclinical pharmacology, the key experiments have standard methods and could easily be set up and used for replication. And in cases where direct replication is impractical — say, in large clinical trials — one could try an independent statistical reanalysis of the data.

Dr. Rossner’s article’s final section offers some well-trodden alternatives to direct replication: better training, more rigorous design, reagent authentication. These are all fine, and they’ve been discussed ad nauseam, in Collins’ Roadmap and before. But experience tells us they’re not enough. He also suggests catching bad papers through image forensics. Sure — but by that point, the damage is done. At best, forensic analysis filters out a fraction of bad submissions at the editorial stage and protects some journals –– those that are willing to dig into their profits and spend extra money on fraud detection. Also, in the end, those flawed papers will simply find another journal to publish them.

Which is why I don’t see any alternative to finally putting direct replication studies on the table. What is the use of a literature where the majority of the published studies are irreproducible? Who does that serve? How can anyone build on it? It still boggles the mind that anyone in the scientific community would take the time to argue against replication.

In an ideal world, scientists, as a matter of routine, would design and conduct replication studies to verify their own findings. They should be adamant that some of the funding they have should be spent on checking and replicating their data –– well before writing up the findings for publication.
In an ideal world, institutions would treat reproducibility as essential to their reputation. Because how could they patent and translate findings that do not stand up to validation? And which investigator would be crazy enough to go and work in an institution that is known to churn out unreproducible papers? And what founding agency would give grants to an institution with a questionable reputation? (Remember, I am still talking about ideal world here.)
Taxpayers demand validated, translatable science. So, yes, the NIH should start spending some of its money (to be exact, the American taxpayers’ money) to replication and validation. Either by paying grantees to do it, or by conducting some of the replication and validation work internally. There is actually an entire NIH institute, called NCATS (National Center for Advancing Translational Sciences), whose scope, in an ideal world, would include this type of activity.
In a perfect world, scientific journals, should also be adamant about making sure that the material submitted to them is validated, real, and independently repeatable. Why not, then, invest a fraction of their significant profits –– for example, Elsevier’s current profit margin is 40% –– in experimentally vetting some of the work they publish? Either during the peer review process, or even after the paper is out. And –– if the key findings do not reproduce –– they should amend or even retract the paper.

In that sense, Dr. Rossner is half-right: it shouldn’t be the NIH who is paying for replication. Better said, it shouldn’t just be the NIH: institutions and journals should chip in, too. In an ideal world, they would do all of that, on their own, in their own interest, without any outside prodding. Of course, I’m aware that we don’t live in that ideal world. But we should not lose sight of the reason –– the only reason –– why we are all doing biomedical research in the first place. If with some reallocation of funding priorities, shifting of research focus and rethinking the publication process, we can create a scientific literature that is more trustworthy, by all means, we should give it a try.

Csaba Szabo is a professor of pharmacology at the University of Fribourg, Switzerland.

Response from Mike Rossner:

I appreciate Dr. Szabo’s careful reading of my guest post. It seems that he interpreted my post as “against replication,” which was not my intention. I agree with Dr. Szabo that reproducibility is a problem and that replication is part of the solution. I apologize if that was unclear.

I do not, however, think that a systematic, post-publication, replication program is an effective or efficient way of doing replication studies, for the very practical reasons outlined in my original post. Politics had nothing to do with that opinion. I would have opposed such a proposal from any administration.

I do not think that my use of the term “systematic” was unclear; I referred to a selection process several times in the post, and it was thus clear that I was not referring to a “universal” effort to replicate every study post-publication. Dr. Szabo refers to selection processes in his post, but it remains unclear who decides what a “high impact, influential claim” is, or what a “key finding” is, or what a “top journal” is? More importantly, any selection process limits the number of studies replicated, so that a large percentage of published studies remain irreproducible.

Dr. Szabo tries to debunk some of the practical reasons presented in my original post, but he does not address the most critical one—timing: “people will have already spent time figuring out for themselves whatever the findings of the replication initiative later reveal.” In other words, the “organic” post-publication replication system already works in some sense, in that the work that is deemed important enough to build on already gets tested quickly by people vested in the outcome.

In my original post, I stated that I believe researchers could be motivated by various stakeholders to replicate their own studies BEFORE publication. It seems that Dr. Szabo and I agree about the importance of pre-publication replication and the role of various stakeholders in making that happen. Dr. Szabo argues, however, that publishers make so much money that they should pay for replication studies. That’s just not going to happen at any significant scale. It took 20 years to get the large commercial publishers to begin to pay for image screening pre-publication, which has an obvious financial benefit to them of avoiding costly investigations if issues are raised post-publication. There is no obvious financial burden to a publisher if a study cannot be replicated post-publication, because that is not necessarily grounds for an investigation.

Once again, I appreciate Dr. Szabo’s attention to this important topic. There are a lot of issues to consider before taxpayers start spending billions of dollars on post-publication replication studies.

— Mike Rossner

Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on X or Bluesky, like us on Facebook, follow us on LinkedIn, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].

Processing…

Success! You're on the list.

Whoops! There was an error and we couldn't process your subscription. Please reload the page and try again.

10 thoughts on “Guest post: In defense of direct replication studies (if they even need defending)”

M. Warshaw says:

June 25, 2025 at 8:02 pm

Dr. Rossner says “people will have already spent time figuring out for themselves whatever the findings of the replication initiative later reveal.” That is problematic for multiple reasons:

* It’s wasteful – you have multiple people spending time and resources futilely trying to figure out why their studies that are based on the previous finding are failing
* It’s hard to get negative results published, meaning that knowledge of a study’s inability to be replicated isn’t shared.
* If the original study is by someone prominent, a junior researcher may have trouble getting published and even be hesitant to try to publish a paper saying the work couldn’t be replicated due to understandable fear of retaliation.

I know of too many cases where doctoral students, post-docs, or junior faculty wasted time trying to use unreliable methods or build on unreplicable findings, harming their career prospects.

Fernando Pessoa says:

June 26, 2025 at 2:15 am

“Elsevier’s current profit margin is 40%”.
Why doesn’t the Dutch government, there is still an administration after the government resigns, put a levy on Elsevier to help fund replication studies? Elsevier is highly unlikely to volunteer to pay, it is based in the Netherlands, where taxes are efficiently collected. Of course Elsevier might move to Panama or Liberia, where many offshore shipping companies are registered, but I doubt it.
Springer Nature, and Wiley, might also try the Panama/Liberia route, but I doubt it. The main publishing houses do own science, in that they make very high profits out of it. The scientists are just the peasants, although the peasants did not have to provide their own salaries, but worked off the land and had to pay the landlords part of their produce. The universities might think about paying their faculty rather than making them provide their own salaries. Those universities with endowments should dig into them and pay!
Funny that there is money for the administrators, money for the university leadership, money for the people who maintain the physical plant, money for the cleaners, but no money for the scientists.

Paul Brookes says:

June 26, 2025 at 9:01 am

I am in two minds, at extreme ends of a spectrum, about how to fix this issue of reproducibility…
Approach 1 is to make the scientific ecosystem much smaller and of higher quality. Fund fewer labs, give them lots more resources, and go the extra mile to ensure what they do is of the utmost rigor. Take the current apple pie of science and make it into a perfect bite-size Michelin-star patisserie. This is problematic because it requires a selection process for the lucky few who get to taste said delicacy, and such selections are difficult to engineer without bias (especially in the current political climate in the US).
Approach 2 is to accept that science is inherently noisy, messy, and subject to human imperfections (just like any other field). So, if we want to avoid rotten bits of pie, just make a bigger pie. Attempts to eliminate non-reproducibility (i.e., bake only good pie) represent a fundamental misunderstanding of the nature of the scientific discovery process. Just accept that it’s noisy and throw ALL the money at it. Even if 90% of results are garbage, 10% of a big pie will still produce produce more real science than 10% of a small pie. The problem is where should that money come from?
While the solutions proposed in these two articles are reasonable and workable, I don’t think they will truly bring about the elimination of the problem, even in combination. True change will require throwing the pie away and baking a new one from scratch.
And now I am hungry for pie!

Dangerous Bacon says:

June 27, 2025 at 9:24 am

When (not if) David Geier’s upcoming study at the behest of RFK Jr. finds that vaccines cause autism, will the NIH demand outside replication of critical finding(s)?

Or will supporters of the study denounce critiques and calls for replication as “political”?

1. Anonymous Historian says:
  
  June 27, 2025 at 1:40 pm
  
  Of course they will. The major thing that Dr. Szabo overlooks is that people like RFK and Bhattacharya are not making these proposals in good faith. They’re doing it to undercut actual scientific consensus and analysis so they can push their own agenda.
  
a scientist says:

June 27, 2025 at 9:58 am

This is indeed a useful discussion to which I would add a couple of points
1. The ultimate decider of reproducibility is the amount of good work that builds on an original piece of work-in this sense our profession still is OK for the most part if you take a long view on progress.
2. How bad is the problem in basic science in comparison to the commercial sector? Where is extreme vigilance required? What is the success rate of biotech companies or of drug development? Is there something relevant to our discussion there?
3. It is true that bad science can be expensive science, but in my opinion bad science is usually cheap science relying on dodgy reagents and equipment. Bad expensive science is not usually relying on a bad single experiment (there is most of the time a good experiment which gave rise to the project) but a bad extrapolation of an originally good experiment to a speculative conclusion with overblown implications. In this sense, replicating one crucial experiment may not do the trick.
4. Out of the noise, some of it useless and some of it misleading, there are little diamonds emerging which move whole fields forward. Don’t try to get rid of the noise by pre-selecting and restricting who is worthy of funding.

Alex Byrnes says:

June 27, 2025 at 11:21 am

Good for Retraction Watch giving this a fair airing. Szabo is eminent in primary research and he is a rare open science advocate willing to work against polarization.

Polarization is our worst problem, in my opinion. Few researchers are willing to go against their career interests. Once you add political interest, the number gets too small to see. (That said, maybe Dr. Rossner is right no politics were involved.)

On the content, I think there should be no doubt about replication as a guiding light.

A similar debate on science.org that could be called “a publisher’s perspective”: https://www.science.org/doi/10.1126/science.adz9553

Dani says:

June 30, 2025 at 7:43 am

TBH- a lot of what I have read re. the gold standard science EO and the memo sounds good. I don’t trust this administration’s intent, however. I feel the approach could be weaponized to silence science that is in opposition to the world views of this administration.

Oddly, as an SME in conflicts of interest, I agree with what the EO and memo say, to an extent. The language is unclear and not the best written. However, I don’t trust the people saying it and am not sure that meaningful change will be supported. In fact, I’m concerned we will be going backwards.

Chris says:

July 1, 2025 at 10:25 pm

It is clear we will never replicate every single published paper or study – so if we had to focus on what research to try to replicate, I suggest focusing on published research that could affect human care or health – such as the research by the infamous Joachim Boldt in Anesthesiology. The impact on patient care and possible patient harm coming from his publications is frightening. Could we at least start by replicating research such as this as a double-check before anyone gets harmed?

Fulvio Magara says:

August 12, 2025 at 8:14 am

Replication studies are conducted every day worldwide, because science is a cumulative process and when a finding is relevant, many other scientists will set out from that point to conceive further achievements. Most findings are inherently solid, and consitute the base of our knowledge. Weak and unsubstantial reports will lead to negative or inconclusive results in further studies. Perhaps, instead of planning a state-controlled systematic replication of selected (by whom?) studies, there should be forms of encouragement to publish “negative” results, in order to speed up the process of natural selection of concepts that is inherent to the scientific process.

Guest post: In defense of direct replication studies (if they even need defending)

Response from Mike Rossner:

Related

10 thoughts on “Guest post: In defense of direct replication studies (if they even need defending)”

Leave a ReplyCancel reply

Response from Mike Rossner:

Share this:

Related

10 thoughts on “Guest post: In defense of direct replication studies (if they even need defending)”

Leave a ReplyCancel reply