Half of researchers have reported trouble reproducing published findings: MD Anderson survey

plosoneReaders of this blog — and anyone who has been following the Anil Potti saga — know that MD Anderson Cancer Center was the source of initial concerns about the reproducibility of the studies Potti, and his supervisor, Joseph Nevins, were publishing in high profile journals. So the Houston institution has a rep for dealing in issues of data quality. (We can say that with a straight face even though one MD Anderson researcher, Bharat Aggarwal, has threatened to sue us for reporting on an institutional investigation into his work, and several corrections, withdrawals, and Expressions of Concern.)

We think, therefore, that it’s worth paying attention to a new study in PLOS ONE, “A Survey on Data Reproducibility in Cancer Research Provides Insights into Our Limited Ability to Translate Findings from the Laboratory to the Clinic,” by a group of MD Anderson researchers. They found that about half of scientists at the prominent cancer hospital report being unable to reproduce data in at least one previously published study. The number approaches 60% for faculty members:

Of note, some of the non-repeatable data were published in well-known and respected journals including several high impact journals (impact factor >20).

To be sure, the paper isn’t saying that half of all results are bogus (although it might sometimes feel that way). As the authors write:

We identified that over half of investigators have had at least one experience of not being able to validate previously reported data. This finding is very alarming as scientific knowledge and advancement are based upon peer-reviewed publications, the cornerstone of access to “presumed” knowledge. If the seminal findings from a research manuscript are not reproducible, the consequences are numerous. Some suspect findings may lead to the development of entire drug development or biomarker programs that are doomed to fail. As stated in our survey, some mentors will continue to pursue their hypotheses based on unreliable data, pressuring trainees to publish their own suspect data, and propagating scientific “myths”. Sadly, when authors were contacted, almost half responded negatively or indifferently.

The researchers blame the problem on a familiar bugbear — the pressure to publish:

Our survey also provides insight regarding the pressure to publish in order to maintain a current position or to promote ones scientific career. Almost one third of all trainees felt pressure to prove a mentor’s hypothesis even when data did not support it. This is an unfortunate dilemma, as not proving a hypothesis could be misinterpreted by the mentor as not knowing how to perform scientific experiments. Furthermore, many of these trainees are visiting scientists from outside the US who rely on their trainee positions to maintain visa status that affect themselves and their families in our country. This statement was observed in our “comments” section of the survey, and it was a finding that provided insight into the far reaching consequences of the pressure to publish.

Of course, pressure to publish doesn’t necessarily equate to bad data. Here the authors, led by post-doc Aaron Mobley, point to other reports of the rise in retractions and the role of misconduct in such removals, to suggest, gently, an explanation for their results:

Recently, the New York Times published an article about the rise of retracted papers in the past few years compared to previous decades [3]. The article states that this larger number may simply be a result of increased availability and thus scrutiny of journal articles due to web access. Alternatively, the article highlighted that the increase in retractions could be due to something much worse; misconduct by investigators struggling to survive as scientists during an era of scarce funding. This latter explanation is supported by another study, which suggested that the most prevalent reason for retraction is misconduct. In their review of all retracted articles indexed in Pubmed (over 2,000 articles) these authors discovered that 67.4% of retracted articles had been retracted due to misconduct [4]. Regardless of the reasons for the irreproducible data, these inaccurate findings may be costing the scientific community, and the patients who count on its work, time, money, and more importantly, a chance to identify effective therapeutics and biomarkers based on sound preclinical work.

Whatever the reason, the issue of data integrity has nettled clinical oncology for years. One of the co-authors, Lee Ellis, co-authored a 2012 article in Nature lamenting the high “failure rate” of clinical cancer trials:

Unquestionably, a significant contributor to failure in oncology trials is the quality of published preclinical data. Drug development relies heavily on the literature, especially with regards to new targets and biology. Moreover, clinical endpoints in cancer are defined mainly in terms of patient survival, rather than by the intermediate endpoints seen in other disciplines (for example, cholesterol levels for statins). Thus, it takes many years before the clinical applicability of initial preclinical observations is known. The results of preclinical studies must therefore be very robust to withstand the rigours and challenges of clinical trials, stemming from the heterogeneity of both tumours and patients.

The newly published survey had an overall response rate of less than 18% — a low mark the investigators attribute in part to fears among staff that the 20-question, online survey would not be anonymous, as promised. Questions in the survey included:

  • When you contacted the author how your inquiry was received?
  • If you did not contact the authors of the original finding, why not?
  • If you have ever felt pressured to publish findings of which you had doubt, then by whom (mentor, lab chief, more advanced post-doc, other)?
  • If you perform an experiment 10 times, what percent of the time must a result be consistent for your lab to deem it reproducible?

Leonard Zwelling, last author of the paper, told us that he believes misconduct as classically defined — the “FFP” of fraud, fabrication and plagiarism — is on the rise.

 My guess is that there is more now than there used to be because there can be. There are so many tricks you can do with computers.  … The other thing that’s a really huge issue is that too many people are not keeping real notebooks; their data are all in the computer.”

And that those data readily manipulable, said Zwelling, who as research integrity officer at MD Anderson drafted a set of guidelines (“that were not widely adopted,” he said ) to require investigators to keep signed and dated lab notebooks.

Zwelling told us that he puts whatever sickness in science squarely at the door of the academic edifice.

I blame this completely and totally on us. We haven’t policed ourselves. There’s such a de-emphasis on real quality and an emphasis on quantity. I would argue that it’s a lack of ethics. The watchword used to be, I’m going to do right, I’m going to do good. Now the watchword is I’m going to do what I can get away with.

That’s not unique to science, of course. Such an errant moral compass also applies today in politics, sports and any other discipline. But Zwelling said the problem in science — and especially in the field of cancer research — has caused good money to chase at best unproven projects with unjustifiable intensity.

Hat tip: Rolf Degen

0 thoughts on “Half of researchers have reported trouble reproducing published findings: MD Anderson survey”

  1. Let’s face it, PLOS ONE typically published papers that are rejected multiple times by more traditional outlets with more rigorous peer review. So, the question really is: what is wrong with this paper?

  2. An excellent comment “….has caused good money to chase at best unproven projects with unjustifiable intensity”

    And yet, even today, good money is perhaps being thrown at research with little chance of ever being translated into the clinic. Grants may well be justified and peer reviewed, but questions relating to whether the justification and peer review are performed with due dilligence remain – are researchers scratching each others back during the grant review process?

    I suspect the number of spectacular failures during translation of “bench to bedside” will increase unless we stamp out science-fraud. therefore there is justification to provide paid and authoritive science fraud investigators by society,

    Anyone can do work in mice – cancer cell inoculation and subsequent drug treatments. Anyone can do work in cell lines – showing expression and repression of cancer cell markers by treatment X or treatment Y.

    But, all the methods, techniques and raw data reporting has to be fully open and accessible for all to see at request. If data is not accessible I would not recommend the work being translated to patient trials. The data included in proposed grant applications should also be made available for all to see. In todays electronic society that would be very easy to set up.

    1. Here are some of the adverse effects from FFP (fraud, fabrication & plagiarism):
      1. The fraudsters get public money (in the form of grants) by deception.
      2. This deprives honest academics from grants, since resources are limited.
      3. Decision-makers (in businesses, hospitals, etc.) make wrong choices, as these are based on fraudulent publications.
      4. Tax payers around the world are robed on unprecedented rate – billions of dollars.

      Everyone agrees that in academic publishing the whole system is based on Honesty.
      However, in “The Truth About Dishonesty” Dan Ariely points out that:
      “Having your motivation influences How_You_See_the_Reality”

      Ariely notes that people think that: “As long as we kick the bad people everything will be fine. But the reality is that we all have that capacity to be quite BAD under the right circumstances” (i.e. opportunities to be dishonest and to get away with it: remember that in Stapel’s case the investigating committee openly admits that “most troubling about the research culture are the plentiful opportunities and incentives for fraud”).

      Paraphrasing Dan Ariely (who speaks about the banking system), I’d say that in academic publishing
      “We have created the right circumstances for everybody to misbehave. And because of that, it’s NOT such a matter of kicking some people out and getting new people in. It’s about changing the incentives structure.”

      Ariely provides the scientific ground for what I say for over one year now on Retraction Watch:
      Not a cosmetic one, but complete overhaul which includes also editors, publishers, institutions.

  3. Amazing study with not surprising results. Amazing study, because it says in part something critical about scientific work at MD Anderson and was at the same time made by researchers employed at the MD Anderson and approved by the Institutional Review Board. I think similar impressions can be found at many institutions. The higher a reputation from an institution is and the better it is funded, the higher expectations on researchers are to publish in high impact factor journals. Not surprising, because conclusions drawn from this study reflect a general consensus in the science community about how academia and research facilities nowadays work and produce scientific data. The trainee questions from table 2 reflect very nicely how the system works in some institutions. Mentors not rejecting invalid hypothesis and insisting for the trainee sticking to a project in publishing in high impact journals. The reality is that not always a project will lead into something of great importance.

    Certain structural changes have to be made to improve experimental research. One point should be in putting bigger responsibility into the hands from menthors and institutions about the quality of the results. And, such as Stewart mentioned before, more accessibility to raw data could help improve the validity of results from experimental research. Misconduct in cancer research ist not only about money. It is also about animal welfare and about patient safety.

  4. No no no! This is all wrong and I find it shocking that there is such naivete in the research community.

    “This finding is very alarming as scientific knowledge and advancement are based upon peer-reviewed publications, the cornerstone of access to “presumed” knowledge.”

    No no no! Peer-review is not, I repeat IS NOT, a form of gold-stamp seal-of-approval. It does not certify that the results are true. It does not authenticate the work or offer any sort of approved status. Peer review is advice given to the editors that a manuscript is a) free of major, obvious known errors; b) within the subject and scope of the journal, and c) of sufficient ‘importance’ to justify publication in that journal.

    Science does not progress through a process whereby Authority A does Experiment Y and is published by approval of Authority B. Instead, science progresses as other people, doing different experiments, publish further work that supports or contradicts previous work. In fact, Peer-review is not relevant to the core process.

    In research you are (hopefully) looking for information in a subject area where the truth is not actually known. Each experiment or observation is (we hope) pointing in the direction of truth. But it may not be.

    You can do your work openly, honestly and rigorously and yet the truth (or otherwise) of your conclusions does not depend on the reviews of your manuscripts. The truth depends exclusively on the results of future work that may confirm or reject your past work.

    I’m shocked that people find it surprising that repeating experiments is difficult. Creating novel scientific information is difficult (even if for no other reason than if it were easy those experiments would have been done already). We are all operating in areas where the true answer is unknown, the important parameters are unknown, and the pitfalls are unknown until you actually fall into every one.

    The idea that you could take a promising biomedical result, and translate that rapidly to clinical practice, is so unrealistic as to be shocking.

    1. Dan Z wrote “The idea that you could take a promising biomedical result, and translate that rapidly to clinical practice, is so unrealistic as to be shocking” is an interesting viewpoint. It therefore questions the validity of doing such expertiments. Dan, do you think we should ban such experiments all-together?

      Science is a search for truth and it requires a practioner with skills sharpened over several decades to enable a fruitful search. Too many labs rely on unskilled students and technicians who lack the necessary know how and experience. Therefore when such individuals repeat experiments the variation is greater that no statistical significance is seen between groups. But, was the repeated work done correctly?

      We must not permit the science fraud of a few to allow oursleves to be diverted from that search or think all work that cannot be repeated is because it is fraud. There are legitimate reasons why experiments cannot be repeated, and until we address those issues we will not move forward.

      The outcome of this is a total lack of patient benefit and scientists arguing amongst themselves. The system needs an overhaul.

      1. “Rapidly” being the key word here. Dan did not say it is impossible, only that it is not rapid. I don’t think he was suggesting banning these experiments.

  5. An experiment HAS to be reproducible. It may be technically challenging, but it must be reproducible. Whether the findings translate is a separate question. What is true in a cultured cell or an animal model may not be true in a human. This is not the problem. The problem is the lack of reproducibility of the original experiment, alongside the fabrication and hidden selection of data. Associated is the difficulty of challenging published data. The argument (nicely torn to shreds by Dan Z) in defense of problem papers is “it is peer reviewed so it must be right”. Just look at the arsenate DNA saga: this was exactly the defence used by the authors of the original Science paper against Rosie Redfield’s challenge. I have direct experience of same argument being used to defend a challenge to the existence of stripes on nanoparticles.
    The massive growth of science in the last 40 years means that our previous methods of challenging data at meetings is no longer effective: you need a big stadium to cater for just one field. The only way I can see to solve the problem is open access to papers and the underlying data. The internet can easily cater for hundreds of stadia.

  6. The issue is not just whether or not the experiment was reproducible, a bigger question is having found that their experiment could not be reproduced, how many of them corrected or amended the record?
    I am guessing the number would be a very round number – or quite close to that very round number.

  7. The most obvious question is about positive result publication bias and statistical significance; if results were published randomly, I should be able to reproduce nineteen out of twenty two-sigma results; but they’re not, so it’s lower. Are they unable to reproduce two sigma results, or twenty sigma results? How many times have they tried to reproduce a result?

    Without knowing either of those, “half of scientists have tried to reproduce results, and failed to do so at least once” has no real information.

  8. What is missing from this survey is how many times experiments were not reproducible compared to how many were reproducible. Otherwise the data are not very meaningful, they could mean 1% of published experiments are irreproducible, or 40%, or 50%, or any other number (maybe ~50% of scientists never try to reproduce experiments). This kind of flaw also plagues the study reported in Nature (Begley, Nature, v483, p531–533), which investigated a non-random selection of published results, failed to disclose the actual data, and used anecdote to support their findings.

    I fully support a more thorough publication process, and specifically would support a proposal that included publication of *all* the data used to support conclusions reported, rather than “representative” data. However, those who cast stones should ensure that their own studies are impeccable.

  9. I expect we’re going to see more of this angst over struggles to develop new drugs/therapies, especially of the sort that Forbes linked to in his post above.

    The problems in the pharmaceutical industry are largely due to the fact that most of the “easy” drugs have been developed. Post-war through the 1980’s produced loads of efficacous (and some not so) drugs during a period when much of the cell biology underlying efficacy was obscure. Now we know a vast amount of the cell biology, and the structures of numerous potential drug targets, and yet this astonishing knowledge can’t be progressed to new drugs. I expect we may be coming towards the end of the progression of novel drug based medicine. In any case we know how to construct societies in which people can live relatively healthily into their 80’s, and if our aim was truly to take another big bite out of the cancer scourge we could do so by targetting lifestyle and diet, and promoting more widespread screening.

    In this situation it’s not surprising that we’re seeing more of this “blame the pre-clinical scientists”. It may be that oncology has some particular problems but some of the statements concerning the inability to reproduce published data and the insinuations about scientific malpractice are silly. In my experience if a pharma company is serious about pursuing a particular line they enter into some sort of collaboration with the pre-clinical researchers to assess potential and ease the research along productive lines (or wait until the methodology is sufficiently developed that reproduction is established -and perhaps then buy the technology). The notion that one can expect to lift some method out of some paper and expect it to lead to a productive investigative line is naive – part of the point of the pharma industry is to assess intelligently the potentials of lines of research and this is bound to involve sifting though quite a bit of early stage pre-clinical research to find the few potential gems.

    Just like in the past the efforts of pre-clinical scientists will no doubt lead to the next breakthroughs in medical efficacies – I expect many of these won’t be drug-based.

    1. Chris, everything you wrote depends upon scientists being honest.

      How can big pharma assess potential drug candidates intelligently if scientist A in lab A has a drug which cures cancer in a preclinical model (there is really no anti-cancer affect of the drug, it’s just they are good at science fraud in lab A) whereas scientist B in lab B has a drug which only reduced cancer by 45% in a preclinical model (this is real, as lab B are honest)?

      In the world we live in now, with restricted funding, alot of time and money will be wasted on pursuing
      lab A’s ‘superdrug’ with no benefit other than paying salaries to those who did the science fraud in the first place.

      1. In my experience scientists are pretty honest but one does have to approach reproducibility with some realism.

        I’ve several times been unable to get a published technique to work. Often it doesn’t matter – you just get on with stuff another way. Sometimes it seems to be important and you make the effort to address the problem. In several instances I’ve visted the publishing lab to learn how to do their technique. I’m very surprised indeed that you can always reproduce protocols in published papers.

        I’ve visited a number of labs in pharmaceutical companies. It doesn’t surprise me that they might be unable to get published protocols to work. The labs I’ve visited usually have very strict regimens for performing experiments and it’s often important that in house procedures are followed. If a published observation doesn’t fit into their particular methodologies it’s quite likely not to work as reported.

        Again it comes down to how important one thinks the published procedures are. If a company is truly interested in pursuing something then they need to do the work of addressing problems especially through contact/collaboration with the authors.

        And the idea that one would waste a huge amount of time and money on a published protocol that seems not to work in ones’ onwn hands is just silly. You usually find out quite quickly if something isn’t working. In that case you address the situation with a bit of intelligence rather than throwing money and effort at it willy nilly.

        The unfortunate thing is that we’re talking hypothetically and anecdotally about this. The poor PLOS paper is pure anecdote as are many of the comments of pharmaceutical spokespersons as in the link that Forbes posted above. I don’t find that sort of stuff very scientific or convincing…

  10. The surprise and alarm demonstrated here reflect the common misconception that well done biological experimentation is clean and devoid of meaningful variability. We are not dealing with analytical chemistry. Biological experiments can be amazingly non-robust with the smallest differences in experimental techniques creating large and significant differences in results obtained. Anyone who has tried to validate a bioassay knows that it can be challenging enough to get experiments to repeat within a laboratory, much less between laboratories. This is why any validation process requires testing multiple analysts and why FDA approved validations require SOPs and not protocols. It is why validated assays must only use validated reagents. Even then, the differences between good laboratory practices can differ between labs and institutions. Inability to reproduce published results is an expected result of biological experimentation. It has nothing to do with ethics or fraud or laziness.

    It would be an impossible task to expect academic researchers to operate using regulatory guidelines. Understanding the limitations of published research is a key component of doing this type of work. If you are interested in pursuing a published result, you try to reproduce the data. If you can’t you try to resolve the issue. If this doesnt work, it *still* says nothing about the quality of the original work.

    What is needed is a way to openly communicate these inconsistencies. This can not happen until any “stigma” associated with to the original publishers of the data is removed. Judging by many of these comments there is still a long way to go.

    1. “Judging by many of these comments there is still a long way to go.” You hit the nail on the head, and it is obvious, when you think about it: who has the time to spend making these kinds of lengthy and usually circumstantial comments? Certainly NOT practicing scientists with full-time jobs. Practicing scientists are busy doing science. So, what you have is a situation in which these kinds of comments are from keyboard warriors who have no good intuition about the natural variability in real biological datasets and all they can think of is fraud and sloppiness.

      1. Average PI wrote “…..who has the time to ….Certainly NOT practicing scientists with full-time jobs”

        As a practising scientist, full-time and busy doing science I do have time to comment here, sometimes even quite lengthy.

        It would be for worse to waste several years, energy and money to pursue non-science due to science-fraudsters.

        1. Why would anyone “waste several years, energy and money to pursue non-science..”?

          You try something. It doesn’t work…you try again..it still doesn’t work…you mess around with the conditions and it still doesn’t work. At this point you talk to people, hopefully including the lab that published the procedure. If they appear convincing you send a PhD student to try it out in their lab…Maybe they don’t seem very convincing and you get on with what you were doing otherwise. It depends on how important you think it is and perhaps your gut feeling or your colleagues insight/experience about the reliability of the publishing lab.

          1. Chris, it would take several months, even years to reporduce some experiments, this includes preclinical work where several cohorts of animals are required.

            Potentially alot of watsed time, alot of wasted money, and no benefit to society.

    2. mmm wrote “Inability to reproduce published results is an expected result of biological experimentation. It has nothing to do with ethics or fraud or”

      After several decades of doing exactly that I simply cannot agree with that statement.

      Biological experiments are very easily reproducible by experienced, skilled researchers.

  11. After reading the recent story about Ranbaxy’s retraction of Lipitor, I realized that these are all childish shenaningans by all these distinguished professors and scientists, compared to when someone does something unethical that effects the health of millions of people directly.


    “It’s just blacks dying” indeed… and the best part, just like in science-fraud,, no one is going to jail after the small matter of a 500 million dollar fine.

  12. “Almost one third of all trainees felt pressure to prove a mentor’s hypothesis ………”

    I have always thought that the best scientists want to test their hypotheses, not to prove them. Attempting to prove a hypothesis can lead to narrow thinking and poor experimental strategies.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.