As if peer reviewers weren’t overburdened enough, imagine if journals asked them to also independently replicate the experiments they were reviewing? True, replication is a big problem — and always has been. At the November 2016 SpotOn conference in London, UK historian Noah Moxham of the University of St Andrews in Scotland mentioned that, in the past, some peer reviewers did replicate experiments. We asked him to expand on the phenomenon here.
Retraction Watch: During what periods in history did peer reviewers repeat experiments? And how common was the practice?
Noah Moxham: Not tremendously! It was quite common at the Royal Academy of Sciences (after the 1789 Revolution, the Institut de France) in Paris from about the mid-eighteenth century. It was mostly used to evaluate the work of outsiders — meaning, non-Academy members. There were also exercises in systematic replication between the Royal Society of London and the Oxford Philosophical Society in the early 1680s, when magnetic experiments and chemical analysis of minerals would be carried out in one location and details of the experiment (together with the raw material, where necessary) were sent to be tried at the other. But it’s difficult to call that peer review because it wasn’t explicitly tied to any kind of publishing or gatekeeping protocol.
RW: Where there any remarkable examples in history where referees repeated experiments and obtained some notable results?
NM: There may well be such, but I’ve not met them – not specifically among attempts at replication by referees, anyway! There were some impressive early instances of referees reaching the wrong decision, though – Joseph Fourier’s work on the propagation of heat in a solid body was kept from reaching print for almost 15 years by the referees at the Institut despite repeated examinations and the fact that the same work won one of the Institute’s prizes in 1812. In that case the scepticism was about the rigour of his mathematical method, even though the referees failed to come up with any instance in which it didn’t work.
RW: Today, it would be almost unheard of for a reviewer to repeat an experiment. When and why did the practice stop?
NM: Largely for the reasons you would expect; it was time-consuming and potentially very expensive. When the Royal Society instituted written editorial refereeing at its journals in the early 1830s one of the first people approached as a referee was Michael Faraday. He took it for granted that replicating the experiments on which he was being asked to comment was a natural part of the process and couldn’t see how a referee could pronounce authoritatively on the basis of anything less – but he also said that he didn’t have time to repeat them and that he didn’t think it was reasonable to ask! That was right around the same time the Parisians gave up on replication as part of refereeing, on the grounds that it added further delay to what was already a pretty slow route to publication.
RW: Do you think it is a good idea for peer reviewers to repeat experiments? And how feasible would that be in today’s science, given how quickly scholarly literature is expanding?
NM: Good question. It sounds like a fine idea in principle – although it’s worth pointing out that not all science is experimental, and that certain kinds of experimental science have a far easier time controlling the variables than others, so the feasibility and value of replication would differ from case to case. Even in instances where it might be applicable I think it’s doubtful how much rigour it would really add – there’d still have to be a core presumption of good faith and competence on the part of the experimenter, extended to cover the referee as well as the primary researcher. Granted, the referee would (or should) have less vested interest in the outcome of the research, but one of the complaints frequently heard about refereeing isn’t just its purported lack of rigour but that too often it isn’t impartial. (That’s a historical as well as a contemporary problem!)
RW: Although establishing rigorous peer review and reproducibility together seems like hitting two birds with one stone, what are the potential downsides to combining the two?
NM: Well, it could drive up the cost of experiment considerably, and correspondingly increase the pressure already complained of in science to produce positive, eye-catching results. But I think the question itself is problematic, because peer review has functions that replication can’t assume or make meaningfully more rigorous. Peer review isn’t a stamp of epistemic authority, and I think we go wrong in trying to treat it as such; it sets a threshold for scientific publication, that a notionally independent person with roughly appropriate expertise has found a given piece of research sound plausible or intriguing enough to warrant publication in a particular venue. As currently practised, it stands in for independent replication, at which it falls short, but also has a host of other, more subjective functions. Put like that, it doesn’t sound like the impregnable fortress against error and malpractice that it’s too often cracked up to be in public discourse, where it’s evoked as shorthand for the rigour of scientific method and so routinely asked to do more than it reasonably can. It’s fundamentally a compensatory mechanism, and it can’t deliver ideal rigour where other conditions – research funding, or the prestige economy of academic science – are less than ideal.
That said, I don’t think peer review is a bad system. It represents an expert initial judgement of whether enough information has been given to replicate a study, of whether the results of the study seem persuasive on its own terms, and of whether those terms are methodologically legitimate. Actual replication attempts might then be reserved for particularly important or controversial or unexpected results; that could provide the replicator with a publication, as you suggest, the original paper and its authors would benefit from the prestige of passing more rigorous scrutiny, and the public would benefit from the more secure establishment of important knowledge. There’s a really strong argument, especially right now, for expanding the role and prestige of replication, but we should keep in mind that any system of scientific assessment will still rely to a large degree on the good faith of those involved, researchers and assessors alike. There’s no way to make it bullet-proof, so the need for organisations like Retraction Watch will probably continue.
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.