Retraction Watch

Tracking retractions as a window into the scientific process

We are judging individuals and institutions unfairly. Here’s what needs to change.

with one comment

Yves Gingras

Yves Gingras

The way we rank individuals and institutions simply does not work, argues Yves Gingras, Canada Research Chair in the History and Sociology of Science, based at the University of Quebec in Montreal. He should know: In 1997, he cofounded the Observatoire des sciences et des technologies, which measures innovation in science and technology, and where he is now scientific director. In 2014, he wrote a book detailing the problems with our current ranking system, which has now been translated into English. Below, he shares some of his conclusions from “Bibliometrics and Research Evaluation: Uses and Abuses.”

Retraction Watch: You equate modern bibliometric rankings of academic performance to the fable about the Emperor’s New Clothes, in which no one dares to tell a leader that he is not wearing an invisible suit – rather, he is naked. Why did you choose that metaphor?

Yves Gingras: Given the fact that most experts in bibliometrics have for many years clearly shown the invalidity of most of the indicators used in academic rankings, that raises a real question about the capacity for critical thinking on the part the users of those rankings. So, I thought the best pedagogical image to make that salient was to think of the classic tale The Emperor’s New Clothes by the Danish author Hans Christian Andersen. One just had to replace “clothes” by “ranking” and the tale still made perfect sense!

RW: Which specific systems of academic rankings do you believe are most problematic, and why?

YG: First we must clearly distinguish between indicators, rankings and evaluation. An indicator is a variable used to measure a concept. Hence a thermometer provides an indicator to measure “temperature” (not “humidity”). Rankings use indicators to rank on a given scale. So before accepting any such ranking, one must first make sure that the indicators used for ranking are valid. If the indicators are not valid, then one should not use that ranking. Period. In the book, I show in detail why most indicators used in existing rankings like the Shanghai and Times Higher Education World University rankings are invalid. In addition to using ill-defined indicators they compound the problem by using arbitrary weights to combine them in a single number in order to rank institutions in declining value of this single number. Presently, the only valid approach is the one provided by the CWTS Leyden ranking, for it does not combine heterogeneous measures into a single one and use a well-defined measure that is based on aggregated and normalized citations measure. It compares universities using different indicators, like impact or collaboration, letting the user choose the one it prefers as a measure of the dimension it wants to evaluate.

RW: What’s the negative impact to individual researchers and universities of these ranking systems?

YG: The irony is that it has been shown that only about 2% of international students in Europe look at those rankings to choose an institution. So managers who try to adapt their policy to elevate their institutions act on an illusion. The problem is that their actions may nonetheless have real effects by asking departments and professors to behave in a manner that rank them higher, without making the research or teaching really better! This also affects individual researchers who may be pressured to adapt even their research agenda in the direction favored by the ranking. Finally, it is worth noting that the belief in rankings can push institutions to behave in ethically dubious manners – for instance, by offering prominent researchers “dummy affiliations” to boost the institution’s ranking.

RW: What’s the alternative? How can we establish a way to quantify the achievements of a researcher, region, or institution, if not under the current system?

YG: When we talk about evaluation, one must understand that the indicators used (say number of papers or citations) must be adapted to the scale of the unit being evaluated: A university (or a country) is a large organization for which aggregate data make sense and are relatively stable and generally change slowly over time. Looking at total number of papers for a given amount of research money and number of professors and at the total citations received taking into account the difference between fields (like genomics and mathematics) certainly makes sense and is useful. I have been doing that for twenty years now at our Observatory of Science and Technology (OST). But for evaluating a given researcher, the scale is micro and the numbers fluctuate much more — the best approach remains peer review by people able to evaluate the content of the research. Only experts in the field can judge the impact of a given research and know the value of the papers published.  An important report by the Council of Canadian Academies insisted upon the fact that indicators cannot replace judgment.

This can in fact be formulated in a syllogism: IF one does not know the field to be evaluated then one needs the crutch of external measures like impact factors or rankings; but then one should not be an evaluator since one does not know the field. On the contrary, if the evaluator does know the field, he or she does not need the crutch of the “impact factors” to recognize a legitimate journal from a predatory one, and a good paper from a bad one. As I show in the book, one can interpret the tendency to use quantitative indicators of all sorts for evaluating individual researchers as a strategy to replace real peers by other kinds of evaluators who do not have to know the field in order to make decisions. This amounts to a deskilling of research evaluation and peer review.

RW: Anything else you’d like to add? 

YG: There is one aspect that I think is too often neglected in debates about bibliometrics and evaluation. It is the fact that once the indicator is valid, one still has to fix the source of data on which it is to be calculated. This raises the crucial issue of the ethics of evaluation. One should not evaluate anybody based on data sources that are not transparent and controlled. So the recent multiplication of private companies wanting to sell their services and private “black box” of data to evaluate researchers is dangerous. University managers who buy those services without due regards to the ethical aspects of research evaluations again behave like Andersen’s Emperor who had been persuaded by people pretending to be tailors that they could weave him a beautiful garment, when in reality it was a scam.

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.

Written by Alison McCook

November 8th, 2016 at 11:30 am

  • Oliver C. Schultheiss November 9, 2016 at 4:14 am

    On a different note: Can the election be retracted?

  • Post a comment

    Threaded commenting powered by interconnect/it code.