Is impact factor the “least-bad” way to judge the quality of a scientific paper?

plos biologyWe’ve sometimes said, paraphrasing Winston Churchill, that pre-publication peer review is the worst way to vet science, except for all the other ways that have been tried from time to time.

The authors of a new paper in PLOS Biology, Adam Eyre-Walker and Nina Stoletzki, compared three of those other ways to judge more than 6,500 papers published in 2005:

subjective post-publication peer review, the number of citations gained by a paper, and the impact factor of the journal in which the article was published

Their findings?

We conclude that the three measures of scientific merit considered here are poor; in particular subjective assessments are an error-prone, biased, and expensive method by which to assess merit. We argue that the impact factor may be the most satisfactory of the methods we have considered, since it is a form of pre-publication review. However, we emphasise that it is likely to be a very error-prone measure of merit that is qualitative, not quantitative.

(Disclosure: Ivan worked at Thomson Reuters, whose Thomson Scientific division owns the impact factor, from 2009 until the middle of this year, but was at Reuters Health, a completely separate unit of the company.)

Or, put another way, as Eyre-Walker told The Australian:

Scientists are probably the best judges of science, but they are pretty bad at it.

In an accompanying editorial, Jonathan Eisen, Catriona MacCallum, and Cameron Neylon call the paper “important” and acknowledged that the authors found that impact factor “is probably the least-bad metric amongst the small set that they analyse,” but note some limitations:

The subjective assessment of research by experts has always been considered a gold standard—an approach championed by researchers and funders alike [3][5], despite its problems [6]. Yet a key conclusion of the study is that the scores of two assessors of the same paper are only very weakly correlated (Box 1). As Eyre-Walker and Stoletzki rightly conclude, their analysis now raises serious questions about this process and, for example, the ~£60 million investment by the UK Government into the UK Research Assessment Exercise (estimated for 2008), where the work of scientists and universities are largely judged by a panel of experts and funding allocated accordingly. Although we agree with this core conclusion and applaud the paper, we take issue with their assumption of “merit” and their subsequent argument that the IF (or any other journal metric) is the best surrogate we currently have.

We have, as Retraction Watch readers may recall, extolled the virtues of post-publication peer review before.

22 thoughts on “Is impact factor the “least-bad” way to judge the quality of a scientific paper?”

  1. The problem is that is takes connections to get into the high impact journals. The typical timeline/career schedule for a high impact journal scientist is… 1) as a student, get into a lab that publishes high impact papers. 2) work really hard and make PI happy. 3) publish as first author in high impact journal. 4) become PI and publish with former PI in authors list. 5) publish in high impact journal.

    There is also the method of recruiting high impact scientists into your project, but you better know somebody because it is very hard to do. They have their own progeny to look after, so your PI or former PI better have a close connection.

    There is also faking data…. which is the riskier, but also successful…and immoral/unethical

    1. That’s terribly cynical John! It depends what you mean by “high impact”, of course. We might define “high impact” as an impact factor greater than 10, although that would mean that Proc. Natl. Acad. Sci. would just miss out even ‘though it’s a high impact journal in most people’s books.

      In my experience publishing in high impact journals (e.g. IF > 10) is straightforward so long as you produce some science that is novel, advances the field, addresses questions of substance and so on. I totally disagree with your view that it takes “connections” as if publishing in good journals is something akin to freemasonary!

      Of course it does help to make good career decisions and ensure that you work with someone good for your PhD and postdoc who is addressing substantial problems, and in an environment with good facilities and so on. But that simply ensures that you come to an early recognition of what problems are important and productive approaches to address these.

      The notion that “faking data” is a successful approach to publishing in high impact journals is astonishing. Try it and see where that leads you in the long term!

      1. I consider IF > 20 as high impact. Those are the journals that ERC grant reviewers consider a requirement for an ERC “excellence” grant. And lets face it, every PI needs at least one Nature, Cell, or Science if they want US funding in these tough times. I would agree that between 10 and 20 it is more about quality than connections. Obviously, connections help, but you don’t need them to get into a 14 journal.

        I would never fake data. But I have seen numerous people make major career advances by doing it….even after their data has been exposed. Too big to fail happens in academia also.

        PNAS is an interesting example for sure…we all know what a disaster PNAS is when it comes to publishing. PNAS is or at least was the prime example of needing connections to publish. All you needed was to be a member…and you got one paper a year. Now you need to know an editor, which lucky they switch them out so more people have chance to publish there. Whenever I have been involved in a paper that the group thought PNAS was an option, the first question was…”who do we know at PNAS?”

  2. It does depend on the stage at which one wishes to assess the merits of a scientific paper and there’s no question that many elements of these are necessarily (and unproblematically) subjective:

    immediate assessment:

    The impact factor is quite useful here, as are other assessment of journal quality (e.g. PageRank) that recognise the impact of journal (e.g J. Biol. Chem.) which has a perceived impact that is much greater than its IF would indicate.

    Love it or hate it (I don’t like it much) the IF results in a time-consuming submission game that sees many papers sent hopefully to a journal that the submitter likely already knows is realistically above the scope of the paper. On rejection, the paper is sent to a fall back journal with a lower real or perceived impact. This does have the effect of more finely grading papers according to their perceived “quality” (the subjective element of peer review can muddy this), and tends to reinforce the impact factors at both the high and low end.

    This is quite different to the situation 10 or more years ago where one knew quite quickly during the course of preparing a paper what the appropriate journal was, and it would generally find its way there without much difficulty.

    longer term assessment:

    Citations are key here. If a paper has an influence on its field then it will be highly cited. If it’s influenced the field then its an important paper. To a first approximation, highly-cited means important. There are exceptions of course!

    1. In the plant sciences, most likely Trends in Plant Science, which has an IF of 11.808, would be considered to be the premier. According to John’s definition, most likely only one plant science journal would be of “high impact”. This is incredibly low relative to other fields in the biomedical sciences. Excellent journals like The Plant Journal (Wiley; IF = 6.582), Planta (Springer; IF = 3.347), Plant Science (Elsevier; IF = 2.922), Journal of Experimental Botany (Oxford; IF = 5.242) or New Phytologist (Wiley; IF = 6.736), several of which are in the “top 10” according to IF, all have sub-standard IF scores relative to other fields. Yet, these journals conduct extremely strict and detailed peer review. Imagine, the European Journal of Horticultural Science, with over 78 years of publishing history, has a measly IF of 0.381. The IF is a game, nothing more than the ability of one company to promote the IF as a measure of quality, which it is not. It is only a measure of citations i.e., popularity, and thus fits perfectly into the Google-generation culture. Unfortunately, the IF is driving policy in countries like India, China, Iran, Turkey, and others, many of which are emerging economies, and thus this trend, and the aggressive marketing ploy by Thomson Reuters to promote the IF as a “quality” parameter, are distorting many of the basic quality issues in science publishing. Regarding the number of citations, although there is little data to prove what I claim, there is an equally aggressive ploy by predatory open access publishers to pump out as many fake or fraudulent papers as possible. In doing so, they will ensure that they get sufficiently cted and thus obtain an IF. Once that IF has been otained, their APFs increase. And the new basic business model of many OA publishers is based on fraud and manipulation. Until Thomson Reuters places a ban on publisherrs that are found to be acting unethically (in any way), and until agencies like the ISSN stop lending support (by assigning ISSN numbers) and thus giving the predators some level of “credibility”, we will keep seeing papers like this in PLoS One, which is a respectable journal, getting praised for its focus on the IF. The scientific community is being so badly brainwashed and blinded, it’s quite amazing to see how marketing has fully taken over science.

      1. JATds, I agree with everything you say about the massive and rubbish tail of the scientific publishing enterprise. However the ability of the low rank rubbish journals to raise their status by playing self-citation games and other subterfuges is pretty limited. A journal can only really raise its status/IF by a very concerted effort to attract good quality papers (examples on both sides of this below).

        However I don’t agree that the impact factor (or other metrics like PageRank) isn’t some measure of the quality/importance/standing of a journal, and by extension of the potential quality of a paper. On balance if you get a paper in PNAS nowadays (i.e. by direct submission route) it’s likely to be addressing an important subject and have a very significant contribution to make.

        And we have to decide what is meant by “quality” anyway. A beautifully constructed study with excellent data, beautifully presented and so on, may be a very high quality piece of work, but of limited value to the field, and so unlikely to be accepted by a high impact factor journal, and unlikely to make much of an impact and to be cited (though you never can tell!)

        Two journals broadly related to my field have gone in different directions over the years. The Biochemical Journal editorial board made a longstanding effort to raise the impact of their journal by strong efforts to refocus the subject matter and to attract high quality papers with some success. They raised their IF. On the other hand Biochemistry, an outstanding journal in the 1970’s/1980’s and a bit onwards and definitely a rewarding place to publish, changed their publishing schedule from monthly to bimonthly, accepted more papers of lower impact, and seems largely to have become a repository for rather hum-drum stuff. It’s impact factor has drifted downwards. In my opinion the respective IF’s of these journals are a broad indication of the quality of the work published there.

        Plant science is perhaps not as sexy subject as biomedical research, and so high IF journals are rare as you indicate. However this presumably isn’t so important e.g. for hiring decisions since one expects the intrinsic qualities of journals and publications to be recognised appropriately by other scientists in the field. Isn’t it also likely that the groundbreaking research in some areas of plant science aren’t published in plant journals but in Nature/Science/PNAS/NuclAcidRes and the like?

    2. I agree with you on citations. That takes time of course, and exposure, which again brings us back to impact factor.

  3. The paper reminds me of the comment by William Goodman about Hollywood: “Nobody knows anything” (he was referring to how no one has any real idea how a given film will do before it is released). Science has the same problem: at the time point required to get a grant based on a given set of published research it is impossible to reliably state how good the research was. Consequently we end up with 3 similar issues (to film production): 1) The people giving out the money continue to rely on dubious metrics or fads to judge where they give money as they need to justify their decisions somehow, no matter how flimsy the logic, 2) Big names really help to get funding and publications as they are seen as a better bet, 3) The process involves a lot of talented people being deemed unsuccessful just through random chance. Sadly I don’t see a better way of fundamentally improving the process…

  4. Cripes! Here is the fuller quote:

    “Many forms of Government have been tried and will be tried in this world of sin and woe. No one pretends that democracy is perfect or all-wise. Indeed, it has been said that democracy is the worst form of government except all those other forms that have been tried from time to time.”

    Let’s remember that Churchill was a most reluctant democrat. Much like a zealous ex-cigarette smoker, his statement came about because he really had been involved with many of the alternatives. He had experienced all that sin and woe.

    We don’t have any pretence to democracy in scientific research. If we did, researchers would vote every five years for the next heads of their institutes wouldn’t they? We are nowhere near having the “worst form of [research] government except all those other forms”. It’s all totally dirigiste and trivially buzzworditic, and nowhere more so than in the good ol’ USA, the land of the free.

    Anonymous peer review is the only mechanism available to inject some sort of sense into the publication process. It buffers the process, but no more than that. It is here to stay, we need it, but it’s not very good.

    Impact factor is for unimaginative beancounting jobsworths.

    There has never been a mechanism for post-publication peer review that might yield informed scientific consent on good or bad publications. Is a scientific equivalent of the “worst form of government except all those other forms” even feasible? it’s easy to point out the potential dangers but, ultimately, who can tell? We will never know if we don’t try it.

    So here are two contemporary and on-going post-publication peer reviews. They’ve stimulated thousands of views and at least some of the posts have clearly taken the authors considerable time to compose.

    I wonder if they are in high impact journals who might not be enjoying the discourse?

    Democracy requires both informed consent and the ability to change ones mind. Above all else, it is always allowed to occur in the public sphere.

  5. Another view:
    High impact means, by definition, strong influence of the papers published there on the readers.
    Therefore, if a paper published in a journal with IF=10 collects 100 references, its scientific value equals that of the paper that collected 10 references in a journal with IF=1 (assuming linear IF character).

    There ought to be an end to the stupidity in “assessments”. The help can only come when the author starts the paper with the description of the studied phenomenon on the elementary level (as opposite to citing references to the last publications in the area). He then formulates his question to the nature of the phenomenon and proceeds to state his approach. He presents his data and says how the data are answering his initial question. The paper goes to the peers and it is decided: if the author’s conclusions are correct, and how important is the work.

    If the author did not do his job, no one can say how important is the paper.

    1. Your fine reply captures it in one sentence!

      1. Have I been naive to believe the mantra that impact factors are meant to rank journals and not articles? At least that was what we were meant to believe few years back. The irony: we are keen to say that plagiarism software should be taken with a grain of salt, but now some seem to believe that a simple metric can rank our own hard work. Bureaucrats’ dream come true.

      2. The question itself is wrong. Instead, we should be asking whether impact factors (and all that come with them) are among the reasons that cause the current retraction business and other malpractice. As JATdS hinted above, impact factors may have some value for the clueless public, bureaucrats, and publishers, but other scientists do see beyond the myriad and craze of contemporary publishing.

  6. I agree with many of the comments here. It is a bit of a straw man to use only two reviewers to assess a paper, show that their scores are only weakly correlated, and then conclude that reviewer assessments are “error-prone.” No one would accept experimental data with only two samples per data point. It is already well established that agreement is poor between two reviewers (Rothwell and Martyn. Brain 123:1964, 2000), and a greater number of reviewers is required to achieve reasonable precision in peer review (e.g., Kaplan et al. PLoS One 3:e2761, 2008). I don’t dispute the subjectivity, bias and other shortcomings of conventional peer review, but I find the authors’ contention that impact factor is “the most satisfactory of the methods” to be quite unpersuasive. The authors’ preference for impact factor (of the journal) over number of citations (of the paper) as an indicator of the merit of a paper is particularly puzzling. They seem to contradict themselves in stating an assumption that “the number of citations are unaffected by the IF of the journal” and then later acknowledging that “the number of citations is strongly affected by the journal in which the paper is published.”

    1. “No one would accept experimental data with only two samples per data point.”
      It depends…
      “…greater number of reviewers is required to achieve reasonable precision in peer review…”
      A new industry needed?
      “the number of citations is strongly affected by the journal in which the paper is published.”
      That’s correct.

    2. agreed. it is hard to believe that the authors think publishing in a high impact journal is superior than achieving higher citations!

  7. I know how to judge the quality of a paper. Did it get directly replicated or not? There are three categories. Yes, No, NA.

    1. I would take it even a step further. Has other highly cited work been built off of the work of the paper. Basically, the number of citations of original work (not reviews) that have cited a paper. I am sure that calculation is out there someone. Maybe TR is already doing it.

      1. Once an incorrect finding makes it into the literature it is difficult to eradicate and may be pushed ahead by publication bias. Unless there is a direct replication done by a skeptical, independent group nothing has been verified and the results reported in the paper should not be considered reliable no matter how many have cited it.

        1. I totally agree, which is why a wrote “a step further”.
          First, replicated by others, particularly people not associated with the publishing group. Then, citations of citations. There is a lot of bad work out there that no one can replicate. Millions in funding tossed at therapies that no one outside of the publishing group can repeat. Scary.

  8. It bears repeating what the fundamental flaw of the IF is for paper assessment. The distribution of citations in high IF journals follows a power-law such that a small number of papers account for most citations. (A quarter of papers have 89% of citations in Nature, if I recall correctly.) So by employing IF we are using the mean as a measure of central tendency for a very highly skewed distribution, which is misleading and nonsensical. In my opinion, the IF it is a highly unambiguous, easy to calculate, and quite meaningless score. It is very hard to argue that a paper with no citations in a very IF journal is in some sense «better» or has a higher «impact» that a paper with many (or even, some) citations in a low IF journal.

    1. “So by employing IF we are using the mean as a measure of central tendency for a very highly skewed distribution, which is misleading and nonsensical. In my opinion, the IF it is a highly unambiguous, easy to calculate, and quite meaningless score.”

      Many people have said the same about comparing group averages using p values.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.