As if peer reviewers weren’t overburdened enough, imagine if journals asked them to also independently replicate the experiments they were reviewing? True, replication is a big problem — and always has been. At the November 2016 SpotOn conference in London, UK historian Noah Moxham of the University of St Andrews in Scotland mentioned that, in the past, some peer reviewers did replicate experiments. We asked him to expand on the phenomenon here.
Retraction Watch: During what periods in history did peer reviewers repeat experiments? And how common was the practice?
Noah Moxham: Not tremendously! It was quite common at the Royal Academy of Sciences (after the 1789 Revolution, the Institut de France) in Paris from about the mid-eighteenth century. It was mostly used to evaluate the work of outsiders — meaning, non-Academy members. There were also exercises in systematic replication between the Royal Society of London and the Oxford Philosophical Society in the early 1680s, when magnetic experiments and chemical analysis of minerals would be carried out in one location and details of the experiment (together with the raw material, where necessary) were sent to be tried at the other. But it’s difficult to call that peer review because it wasn’t explicitly tied to any kind of publishing or gatekeeping protocol.
RW: Where there any remarkable examples in history where referees repeated experiments and obtained some notable results?
NM: There may well be such, but I’ve not met them – not specifically among attempts at replication by referees, anyway! There were some impressive early instances of referees reaching the wrong decision, though – Joseph Fourier’s work on the propagation of heat in a solid body was kept from reaching print for almost 15 years by the referees at the Institut despite repeated examinations and the fact that the same work won one of the Institute’s prizes in 1812. In that case the scepticism was about the rigour of his mathematical method, even though the referees failed to come up with any instance in which it didn’t work.
RW: Today, it would be almost unheard of for a reviewer to repeat an experiment. When and why did the practice stop?
NM: Largely for the reasons you would expect; it was time-consuming and potentially very expensive. When the Royal Society instituted written editorial refereeing at its journals in the early 1830s one of the first people approached as a referee was Michael Faraday. He took it for granted that replicating the experiments on which he was being asked to comment was a natural part of the process and couldn’t see how a referee could pronounce authoritatively on the basis of anything less – but he also said that he didn’t have time to repeat them and that he didn’t think it was reasonable to ask! That was right around the same time the Parisians gave up on replication as part of refereeing, on the grounds that it added further delay to what was already a pretty slow route to publication.
RW: Do you think it is a good idea for peer reviewers to repeat experiments? And how feasible would that be in today’s science, given how quickly scholarly literature is expanding?
NM: Good question. It sounds like a fine idea in principle – although it’s worth pointing out that not all science is experimental, and that certain kinds of experimental science have a far easier time controlling the variables than others, so the feasibility and value of replication would differ from case to case. Even in instances where it might be applicable I think it’s doubtful how much rigour it would really add – there’d still have to be a core presumption of good faith and competence on the part of the experimenter, extended to cover the referee as well as the primary researcher. Granted, the referee would (or should) have less vested interest in the outcome of the research, but one of the complaints frequently heard about refereeing isn’t just its purported lack of rigour but that too often it isn’t impartial. (That’s a historical as well as a contemporary problem!)
RW: Although establishing rigorous peer review and reproducibility together seems like hitting two birds with one stone, what are the potential downsides to combining the two?
NM: Well, it could drive up the cost of experiment considerably, and correspondingly increase the pressure already complained of in science to produce positive, eye-catching results. But I think the question itself is problematic, because peer review has functions that replication can’t assume or make meaningfully more rigorous. Peer review isn’t a stamp of epistemic authority, and I think we go wrong in trying to treat it as such; it sets a threshold for scientific publication, that a notionally independent person with roughly appropriate expertise has found a given piece of research sound plausible or intriguing enough to warrant publication in a particular venue. As currently practised, it stands in for independent replication, at which it falls short, but also has a host of other, more subjective functions. Put like that, it doesn’t sound like the impregnable fortress against error and malpractice that it’s too often cracked up to be in public discourse, where it’s evoked as shorthand for the rigour of scientific method and so routinely asked to do more than it reasonably can. It’s fundamentally a compensatory mechanism, and it can’t deliver ideal rigour where other conditions – research funding, or the prestige economy of academic science – are less than ideal.
That said, I don’t think peer review is a bad system. It represents an expert initial judgement of whether enough information has been given to replicate a study, of whether the results of the study seem persuasive on its own terms, and of whether those terms are methodologically legitimate. Actual replication attempts might then be reserved for particularly important or controversial or unexpected results; that could provide the replicator with a publication, as you suggest, the original paper and its authors would benefit from the prestige of passing more rigorous scrutiny, and the public would benefit from the more secure establishment of important knowledge. There’s a really strong argument, especially right now, for expanding the role and prestige of replication, but we should keep in mind that any system of scientific assessment will still rely to a large degree on the good faith of those involved, researchers and assessors alike. There’s no way to make it bullet-proof, so the need for organisations like Retraction Watch will probably continue.
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.
I once had a review of a paper where the reviewer had replicated. To paraphrase “My grad student tried this and it worked”. It does happen, though this was a methods paper and the method is rather simple!
Thank you for the historical perspective.
This isn’t going to happen in any situation with living creatures, be they undergraduate students, cancer patients, lab mice, or cultures of Helicobactor pylori.
Seen refs doing this, but mostly for computational papers. But not a fair way of framing the problem though: it is done routinely by everyone as PPPR! There is no reason why a peer-reviewer evaluating paper after publication should not be called that…
Yes, interesting. PPPR is still far from systematic, though, which is why I’d want to distinguish between it and ordinary peer review, which at minimum is supposed to guarantee that at least one competent person has read the paper before it’s published. PPPR is an optional extra layer of quality control rather than a necessary minimum, and although it may well be more rigorous than pre-publication review in many instances I’m not sure it’s sufficiently coherent and distinct a process that its helpful to call it by the same name as peer review. I do think there’s something curious about the special weight given to pre-publication review though – it’s as if the fact of being called upon to review confers prestige on the reviewer’s opinions, though there’s no prima facie reason why that should be so…
I once had a reviewer of a computer simulation paper write a quick and dirty simulator to check aspects of my results he found (with good reason) to be suspicious. Since the reviewer disclosed his identity, I have listed him as ‘recommended reviewer’ on all similar papers I’ve subsequently written, because you can hardly beat that for reviewer diligence!
I have from time to time written a few lines of code to check something, but never on the scale he did. (As part of a post-publication review I re-scaled and re-plotted the original data from one _Science_ paper, something I wish the regular reviewers had done! But I confess I’ve never done this as a regular reviewer myself.)
Very cool! Replicating results of computational models just based on what is included in papers can be surprisingly difficult (speaking as a doctoral student who has tried, for the purposes of learning mechanisms better)! For older papers I’ve found computationally efficient shortcuts that seem to make sense, but can alter some of the results (though never the more important results).
If the Journals are willing to pay for lab expenses, fund for the reagents and a salary and provide me with all the fancy strains of trnasgneic mice, I will be more than willing to do :p
While replication apparently did not happen much during journal peer review, it’s important to realize that replication may have happened regularly in other ways when results were announced (more than in recent times). I may be wrong, but I think that in the early years of the Royal Society, researchers regularly replicated their work under the eyes of an audience, by conducting a live demonstration (I hope someone can verify whether this is true). Relatedly, in the case of Galileo, I believe his observations of heavenly bodies were not accepted until others were able to fashion equally good telescopes and replicate the observations.
Hi Alex,
You’re right, and that’s an important historical point. Indeed there’s a sense in which a large part of the function of early scientific institutions, and especially the Royal Society, was precisely to reproduce experiments communicated from elsewhere and multiply credible witnesses to them. But this wasn’t by any means consistent. And there’s some evidence that the experiments weren’t always tried in front of witnesses but outside of Society meetings, and the results of the replication attempt then reported to the meeting. (It’s been suggested that the difference may turn on the language in which the result was recorded, and whether it was said to have been ‘shown’, which might only indicate that the results were reported or, if possible, brought in, or ‘tried’, which indicated that part of a plenary meeting of the Society was given over to attempting the experiment.) And it’s equally important to realise, as you say, that such testing wasn’t be any means a prerequisite of publication in the early journal associated with the Royal Society, the Philosophical Transactions…
Thanks very much for this corroboration plus important details and nuance, Noah. If only the Society had built on and formalised that experiment-replication aspect rather than letting it wither! I suspect we wouldn’t have today’s reproducibility problems. But as you pointed out, and Faraday said, it’s too much to expect from a peer reviewer. Another means was needed. Only now, 300 years later, are we finally implementing solutions; a shift in the scientific publishing system and incentives to encourage publication of exact replications.
Organic Syntheses and Inorganic Syntheses are two journals which publish directions for important new synthetic methods in chemistry; part of the review process includes replication (and commenting) by the reviewers. And yes, the names of the “checkers” are published. It is a lot of work for everyone, but the resulting syntheses are recognized as among the most reliable ones in the literature.
With all journals being online today, perhaps each published paper can have a section called “Independently reproduced” where other groups can submit their data/references indicating which data were successfully reproduced. Of course, such postings must not be anonymous (to reduce fake testimonials) but I can’t imagine anyone fearing repercussions for posting something supporting the authors. Such input would provide a huge measure of confidence for readers that the data are genuine and of high quality.
Working in computational materials science, I actually try to replicate parts of the results fairly often as part of my regular reviews, but I’ve never seen the same in my own papers, so I don’t think it’s common practice. I find that quickly reproducing some part of the results is the best way to spot if not enough details have been provided to reproduce the results… and then I can verify that convergence parameters and similar are in fact cranked up far enough.
I’ve caught a few cases of non-reproducible results this way. Unfortunately this has just lead to the journal rejecting the paper outright, at the point where I would have wanted to see if my input might help them getting it right (or to figure out why results don’t match). There is any number of journals to submit these types of calculations to, so I worry that authors just go on to the next one with bad data.
Check out the journal Organic Synthesis (http://www.orgsyn.org/). Specifically: “ All procedures and characterization data in OrgSyn are peer-reviewed and checked for reproducibility in the laboratory of a member of the Board of Editors.” Check out papers where the ‘checkers’ results are added as notes, pictures, etc (e.g., http://www.orgsyn.org/demo.aspx?prep=v97p0189).