Peer review isn’t a core subject of this blog. We leave that to the likes of Nature’s Peer-to-Peer, or even the Dilbert Blog. But it seems relevant to look at the peer review process for any clues about how retracted papers are making their way into press.
We’re not here to defend peer review against its many critics. We have the same feelings about it that Churchill did about democracy, aka the worst form of government except for all those others that have been tried. Of course, a good number of the retractions we write about are due to misconduct, and it’s not clear how peer review, no matter how good, would detect out-and-out fraud.
Still, peer review is meant as a barrier between low-quality papers and publication, and it often comes up when critics ask questions such as, “How did that paper ever get through peer review?”
With that in mind, a paper published last week in the Annals of Emergency Medicine caught our eye. Over 14 years, 84 editors at the journal rated close to 15,000 reviews by about 1,500 reviewers. Highlights of their findings:
…92% of peer reviewers deteriorated during 14 years of study in the quality and usefulness of their reviews (as judged by editors at the time of decision), at rates unrelated to the length of their service (but moderately correlated with their mean quality score, with better-than average reviewers decreasing at about half the rate of those below average). Only 8% improved, and those by very small amount.
How bad did they get? The reviewers were rated on a scale of 1 to 5 in which a change of 0.5 (10%) had been earlier shown to be “clinically” important to an editor.
The average reviewer in our study would have taken 12.5 years to reach this threshold; only 3% of reviewers whose quality decreased would have reached it in less than 5 years, and even the worst would take 3.2 years. Another 35% of all reviewers would reach the threshold in 5 to 10 years, 28% in 10 to 15 years, 12% in 15 to 20 years, and 22% in 20 years or more.
So the decline was slow. Still, the results, note the authors, were surprising:
Such a negative overall trend is contrary to most editors’ and reviewers’ intuitive expectations and beliefs about reviewer skills and the benefits of experience.
(You might ask, “So who peer-reviewed this paper?” A newer reviewer, one would hope.)
Annals of Emergency Medicine is a reasonably high-tier journal, in the top 11% of Thomson Scientific impact factors in 2008. So what’s true for the journal may be true at other top-tier publications.
What could account for this decline? The study’s authors say it might be the same sort of decline you generally see as people get older. This is well-documented in doctors, so why shouldn’t it be true of doctors — and others — who peer review? The authors go on:
Other than the well-documented cognitive decline of humans as they age, there are other important possible causes of deterioration of performance that may play a role among scientific reviewers. Examples include premature closure of decisionmaking, less compliance with formal structural review requirements, and decay of knowledge base with time (ie, with aging more of the original knowledge base acquired in training becomes out of date). Most peer reviewers say their reviews have changed with experience, becoming shorter and focusing more on methods and larger issues; only 25% think they have improved.
Decreased cognitive performance capability may not be the only or even chief explanation. Competing career activities and loss of motivation as tasks become too familiar may contribute as well, by decreasing the time and effort spent on the task. Some research has concluded that the decreased productivity of scientists as they age is due not to different attributes or access to resources but to “investment motivation.” This is another way of saying that competition for the reviewer’s time (which is usually uncompensated) increases with seniority, as they develop (more enticing) opportunities for additional peer review, research, administrative, and leadership responsibilities and rewards. However, from the standpoint of editors and authors (or patients), whether the cause of the decrease is decreasing intrinsic cognitive ability or diminished motivation and effort does not matter. The result is the same: a less rigorous review by which to judge articles.
What can be done? The authors recommend “deliberate practice,” which
involves assessing one’s skills, accurately identifying areas of relative weakness, performing specific exercises designed to improve and extend those weaker skills, and investing high levels of concentration and hundreds or thousands of hours in the process. A key component of deliberate practice is immediate feedback on one’s performance.
There’s a problem:
But acting on prompt feedback (to guide deliberate practice) would be almost impossible for peer reviewers, who typically get no feedback (and qualitative research reveals this is one of their chief complaints).
In fact, a 2002 study in JAMA co-authored by Michael Callaham, the editor in chief of the Annals of Emergency Medicine and one of the authors of the new study, found that “Simple written feedback to reviewers seems to be an ineffective educational tool.”
What about training? A 2008 study in the Proceedings of the Royal Society of Medicine found that short training courses didn’t have much effect on the errors peer reviewers failed to catch. That followed a 2004 study in the BMJ with similar results. And that’s consistent with what one journal editor who looked at the Annals of Emergency Medicine study told us about his own experience, too.
That same editor suggested that another potential fix — continually recruiting less-experienced reviewers, at the top of their games — might not work either. Such reviewers, he said, often didn’t include any narratives or interpretations in their reviews, just lists of comments.
Sounds like a good subject for the next Peer Review Congress, which should be held in 2013. In the meantime, please take our poll on one specific aspect of peer reviewing:
[polldaddy poll=”4134295″]
Great post, Ivan. I have blogged some about the peer review process, and my main observation about it is that there is no base of training that people are required to have in order to review. So, a more interesting question to me is about the absolute quality of reviews, rather than some minute deterioration over time for each individual reviewer. And of course, my cognitive bias tells me that, while this is true for my peers, it is not for me — my reviews continue to be stellar!
One should realize, at least in chemistry, that established researchers often delegate paper refereeing to their postdocs and graduate students. In theory, the PI acts as the final quality control on the review. In practice, I have seen reviews go out the door almost unchecked (the volume of refereeing is unbelievable).
I don’t know if the above study captures this behavior, but the more senior PI is likely to delegate more and more of this type of work over time.
I think more and more senior PIs becoming out of touch with the lab techniques and unable to find inconsistencies in results obtained using certain method as is true for most papers. In my view, a paper gets a thorough review when reviewed by a good post doc or by even PhD student who is well aware that the data presented cannot be obtained with the technique described. I’m not implying this is true in all cases but this is certainly true in biology and immunology. Most senior PIs don’t even read the methods section and I think this is where most of the problems can be detected.
A glimpse from an engineering department:
Prof. gets overwhelmed with reviews. All of them get delegated to phd students and postdocs. Reviews are not proof-read by prof.
Fresh phd students make nice, textbook worthy reviews and invest lots of time. As time goes by, reviews deteriorate to 5 sentences.
Just two out of many reasons:
– Some review portals let you see what other reviewers have written. While you have disected the paper on 2 pages, your colleague did it in 3 sentences, apparently without reading the paper.
– Reviewing a paper, that is a complete fail on so many levels; pointing it all out in a painstakingly long review; and *only* receiving an automated thank you e-mail three months later that tells you in a cheery tone, that the reviewed paper is now published.
The absolute lack of feedback and public accountability has broke the peer review. If it ever worked.
post-publication review the solution ?
Thanks for the interesting comments on our paper. I agree with almost all of them. We try at our journal to improve on some of these failings – we have a validated rating scale and definitions of what we seek in a review, and reviewers who don’t reliably provide them are used less and less (but not never, as speciality area coverage is always needed). All our reviewers also see the complete reviews of others on the same manuscript, and the decision letter from the editor (which is usually pretty specific about weaknesses, etc.). We recognize reviewers in many ways, list the top 50 performers in quality in the journal each year and send them letters of recognition, and those who achieve this list twice in four years are listed in every issue as Senior Reviewers. Still, I can think of no way to provide the kind of detailed feedback reviewers want (and some really need) to improve. The point about delegating reviews to fellows or junior faculty is very true; we ask our reviewers to ask the editor’s permission to do so but it is very rare that we get such a request (whatever that might mean).
Finally I too think the motivation factor is the key issue. I still review occasionally but am just not willing to articulate all my concerns and reasoning at the level of detail that I used to. There are just too many competing obligations, and just as in clinical cases that were fascinating when I was 30 but now are utterly mundane and predictable, many manuscripts also fall into predictable and easily recognizable categories.
The amazing thing about peer review is not its failings, but that it works at all, and that it has worked as well as it has. We know we can’t do without it – just read the blogosphere to see why.
Michael Callaham
Interesting blog.
I am actually surpised to learn that you (apparently) consider it to be the reviewers duty to check for plagiarism. I would argue that the most efficient way to automatically check for plagiarism, is for the journal itself to run each submitted manuscript through a professional software program that detects plagiarism.
Thanks for the comment. We don’t consider it reviewers’ duty to check for plagiarism; we’ve argued in many posts — and in presentations such as this one: http://www.retractionwatch.com/2011/11/21/the-good-the-bad-and-the-ugly-what-retractions-say-about-scientific-transparency/ — that editors should use plagiarism detection software. We were just asking here whether peer reviewers check.