Estimate: Nearly 33,000 papers include misidentified cell lines. Experts talk ways to combat growing problem

Willem Halffman
Serge Horbach

Although most researchers realize too many are using misidentified cell lines in their work, they may be shocked to see the scope of the problem: Approximately 32,755 articles report on research that relied on misidentified cells, according to a new report in PLoS ONE. And even though more people may be aware of the problem, it hasn’t slowed it down: Most of the papers the authors flagged were written after 2000, and the number of new publications relying on misidentified cells continues to grow. We’ve tackled the issue — a 2015 poll of RW readers showed most believed the papers that report data from misidentified cell line should be either retracted or corrected, and our co-founders have recommended journals at least post “expressions of concern.” We spoke with the authors of the latest paper (also covered by The Scientist), Serge Horbach and Willem Halffman at Radboud University Nijmegen in the Netherlands.

Retraction Watch: You estimate more than 32,000 articles that used misidentified cells. That’s a very large number, to say the least — were you surprised at the scope of the problem?

Serge Horbach and Willem Halffman: Yes, initially, we were surprised by the scope of the problem. However, the number of articles using misidentified cells should be considered in the light of the total number of articles on cell lines. Doing this, we estimate the contaminated literature to make up about 0.8% of the total literature on cell lines. This is still substantial, but it puts the number in correct perspective.

We were considerably more surprised by the fact that the number of articles on misidentified cells is still growing. Considering that problems with cell line misidentification have been known since the 1960s and the numerous attempts to tackle the problem ever since, we expected the annual number of published articles to have decreased by now. But we found the contrary. Our analysis demonstrates that both the number of articles using misidentified cell lines, and the number of articles referring to them, are still growing. Moreover, as we demonstrate in three case studies, scientists show little awareness of the fact that cell lines may be misidentified. This clearly indicates that simply making scientists aware of the issues with misidentified cells is not sufficient to deter them from using the cells or citing papers that use them.

RW: You note that the problem of using misidentified cells appears to have gotten worse over time, despite more awareness of the issue. Why do you think the problem has worsened instead of improved?

SH and WH: Despite the enormous efforts of some people to raise awareness about the issue and to develop tools to help address the problem, it still seems that this awareness has not trickled down to all labs. In addition, we were puzzled by the reluctance of some researchers working with misidentified cells to acknowledge misidentification. We get reactions such as: “Oh yes, we know the cells might be misidentified, but it makes no difference for my conclusions.” We really do not see how this can be a reason not to signal misidentification or to prevent it from happening. Signalling misidentifications should not be considered an admittance of failure, but a sign of strength, a celebration of the self-correcting power of research. Especially if conclusions are not affected, there really is no reason to resist a warning label. However, the current reward system of science clearly does not motivate individual researchers to take such steps.

RW: Where do the papers relying on contaminated lines originate?

SH and WH: We analysed the origin of the contaminated papers and found that this is a genuinely global issue. Papers using contaminated cell lines are published by scientists affiliated to universities all over the world. We were surprised by the number of papers originating from countries with well-established research cultures. The existing literature hinted towards the fact that cell line misidentification might be primarily an issue in regions with new or emerging research communities, in which levels of training or access to testing facilities may be limited. However, most contaminated articles in our sample originate from the US (36%) and when we compare the number of articles on misidentified cells with the total number of articles on cell research, Japan finds itself on top of our list. Also several European and Scandinavian countries rank highly.

RW: You suggest that publications which rely on misidentified cells for their conclusions should come with some kind of warning label, such as an EoC. (Our co-founders recommended something similar in a 2016 column.) Why? Are there any instances where the papers should be left alone, or even retracted?

SH and WH: We believe that Expressions of Concern are an appropriate way of labelling the affected literature. Such an EoC could serve multiple purposes. First, if clear and uncontended, the consequences of the misidentification for the article’s conclusions could be reported, but otherwise the expression of concern could merely state: “Cell line X in this study is known to be misidentified and is actually Y. See Z for more information.” The interpretation of this warning is then entirely up to the expert reader. On top of serving as a warning signal, such notifications would also serve to preserve as much valuable data as possible: Data reported on a misidentified cell line might still be entirely valid, provided the real origin of the cell line is clear and the characteristics of the cell were not crucial for the study. Hence it might be a waste of funds and efforts to automatically dismiss these data. In cases where the use of these cell lines leads to (severely) false conclusions that could have a major impact on future research, articles could be retracted.

RW: EoCs will address the problems in previously published papers, but what about the ongoing use of contaminated lines in newly published papers? Are current efforts to slow their use working? If not, what else needs to be done?

SH and WH: We think there is a major task here for journal editors. First, it would be fantastic if a few journals picked up our results and demonstrated leadership in the research community by labelling publications based on misidentified cells. The process would not be easy. The process would start with a literature search, but editors will need to look at individual publications to verify claims are based on misidentified cells.

We found out that it is surprisingly hard to find out exactly which publications make use of misidentified cells. To allow for simple future identification of articles using misidentified cell lines, we recommend that authors mention the employed cell lines in easily searchable parts of their article, such as the keywords or the abstract. Some journals have already suggested and implemented measures in this direction, but implementation generally seems to be slow. However, some journals now use a system of Research Resource Identifiers (RRIDs), which might assist in tackling the cell line misidentification issues.

To avoid the appearance of new articles based on misidentified cells it has been proposed that journals make (genetic) tests on cell lines mandatory for newly submitted manuscripts. With the currently available techniques such tests do not have to be very costly nor time-consuming and they seem to be very effective ways in detecting misidentifications. Again, some journals have already implemented such measures, but general implementation is far from being reached.

RW: How confident do you feel in your totals? Could some cell lines that make up the 32,000 figure be actually authentic, and might there be a large number of misidentified lines that aren’t included in that total? Why or why not?

SH and WH: Our dataset struggles with several limitations, leading probably to a conservative estimate. In identifying the articles we used the list of misidentified cell lines by the International Cell Line Authentication Committee (ICLAC). It is generally accepted that cell lines on this list are genuinely misidentified and we only used those cell lines from the list for which no authentic stock has been reported: As far as known, the cells they claimed to be no longer exist.

Due to some difficulties in our search methods — like some cell lines having very generic names (such as ‘WISH’, ‘EU-1’ or ‘OF’), or cell lines having multiple spellings (such as ‘Intestine-407’, ‘Int-407’, ‘Int407’, etc.) —  we were not able to use the entire ICLAC-list. Therefore we excluded several cell lines from our search, clearly leading to conservative estimates.

All this does not mean that our list of articles is perfect, nor that all articles in our list actually use misidentified cells. The list inevitably contains several false positives. We therefore stress that our data provide an estimate of the size of the contaminated research literature, but they are certainly not sufficiently precise to automatically identify papers, or accuse individual researchers, research teams or institutes.

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here. If you have comments or feedback, you can reach us at

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.