The Anil Potti retraction record so far

A 60 Minutes segment Sunday on Anil Potti has drawn national attention to the case, so we thought this would be a good time to compile all of the retractions and corrections in one place.

Duke has said that about a third of Potti’s 40-some-odd papers would be retracted, and another third would have “a portion retracted with other components remaining intact,” so this list will continue to grow. We’ll update it as we hear about new changes.

Retractions:

  1. Gene-expression patterns predict phenotypes of immune-mediated thrombosis,” in Blood
  2. Upregulated Oncogenic Pathways in Patients Exposed to Tobacco Smoke May Provide a Novel Approach to Lung Cancer Chemoprevention,” in CHEST
  3. Characterizing the Clinical Relevance of an Embryonic Stem Cell Phenotype in Lung Adenocarcinoma,” in Clinical Cancer Research
  4. An Integrated Genomic-Based Approach to Individualized Treatment of Patients With Advanced-Stage Ovarian Cancer” in the Journal of Clinical Oncology (JCO)
  5. Pharmacogenomic Strategies Provide a Rational Approach to the Treatment of Cisplatin-Resistant Patients With Advanced Cancer” also in the JCO
  6. Gene Expression Signatures, Clinicopathological Features, and Individualized Therapy in Breast Cancer” in the Journal of the American Medical Association (JAMA)
  7. Validation of gene signatures that predict the response of breast cancer to neoadjuvant chemotherapy: a substudy of the EORTC 10994/BIG 00-01 clinical trial,” in The Lancet Oncology
  8. Genomic signatures to guide the use of chemotherapeutics,” in Nature Medicine
  9. A Genomic Strategy to Refine Prognosis in Early-Stage Non–Small-Cell Lung Cancer,” in the New England Journal of Medicine (NEJM)
  10. An Integrated Approach to the Prediction of Chemotherapeutic Response in Patients with Breast Cancer” in PLoS ONE
  11. A genomic approach to colon cancer risk stratification yields biologic insights into therapeutic opportunities” in the Proceedings of the National Academy of Sciences (PNAS)

Corrections:

  1. An integration of complementary strategies for gene-expression analysis to reveal novel therapeutic opportunities for breast cancer,” in Breast Cancer Research
  2. Gene Expression Profiles of Tumor Biology Provide a Novel Approach to Prognosis and May Guide the Selection of Therapeutic Targets in Multiple Myeloma,” in the JCO
  3. Age-Specific Differences in Oncogenic Pathway Dysregulation and Anthracycline Sensitivity in Patients With Acute Myeloid Leukemia,” in the JCO
  4. Young Age at Diagnosis Correlates With Worse Prognosis and Defines a Subset of Breast Cancers With Shared Patterns of Gene Expression,” in the JCO
  5. Age-Specific Differences in Oncogenic Pathway Deregulation Seen in Human Breast Tumors,” in PLoS ONE
  6. A genomic approach to colon cancer risk stratification yields biologic insights into therapeutic opportunities,” in PNAS
  7. Characterizing the developmental pathways TTF-1, NKX2–8, and PAX9 in lung cancer,” in PNAS

Partial retraction:

1. “A Genomic Approach to Identify Molecular Pathways Associated with Chemotherapy Resistance,” in Molecular Cancer Therapeutics

It’s important to point out, as one of our commenters has, that while attention has focused on Potti, who resigned from Duke in late 2010, he was just one of many authors on these papers. There are larger issues about data reproducibility and the methods used to come up with the now doubtful data.

Also worth noting is that Potti is in no danger of displacing the current unofficial retraction record holder, Joachim Boldt, whose 90 or so retractions continue to be felt by his co-authors.

16 thoughts on “The Anil Potti retraction record so far”

  1. Interesting to note…. Potti’s retracted papers relate to ovarian cancer, breast cancer, lung cancer, colon cancer, and immune-mediated thrombosis, with corrections relating to myeloma and leukemia.

    Such a scattershot application of a method to various clinical problems suggests that the genome approach is a technique in search of a question. In short, the papers seem to have been published by a coalition of people who fall into two categories: those who know the method, but have little familiarity with the clinical problem; and those who know the clinical problem but have no knowledge at all of the method.

    This seems a recipe for disaster.

    1. I don’t really agree. One cannot expect clinicians to know all aspects of expression profiling, which is why they collaborate with scientists, and vice versa – not every PhD is a doctor.
      I think the problem lies elsewhere: Leaving aside “honorary authorships” (friends and relatives of people who actually did something for the paper), among the people who actually did the work, there was little if any internal control of the process and results, and only a very few people actually understood how the results were achieved to provide the quality control of it. This is why you have to have at least two good bioinformaticians checking your work before you trumpet your discovery and go to Lancet. I just feel – no proofs etc. – that it was a sloppy analytical job done by incompetent people who did not ask people who actually understand this for help. And instead of checking and validating whatever they found, they were so arrogant that they turned poorly analyzed results into a clinical trials and were using a faulty method for further tumour profiling… Incompetence combined with arrogance is always a protocol for disaster…
      I think that Potti received (rightfully) a the biggest portion of the blame, but I am sure that other people with names on the paper have some explaining to do.

    2. You are correct, Dr. Steen – this was indeed a methodology in search of an application, and we are witnessing the unfolding of the disastrous recipe.

      The authors of an early, underpinning publication

      Rainer Spang, Harry Zuzan, Mike West, Joseph Nevins, Carrie Blanchette, Jeffrey R. Marks

      (available at http://www.bioinfo.de/isb/2002/02/0033/main.html)

      indeed stated as much: “Due to the very general setting of our model, we expect it would be successful for a large class of diagnostic problems in various fields of medicine.”

      Unfortunately this paper contains as-yet unproven assertions. Given the number of retractions to date involving this methodology, and the general statistical principles violated by the methodology, my guess is that if anyone ever attempts to complete the research outlined in this paper to provide actual scientific evidence testing the unproven assertions, the outcome will be a repudiation of the assertions. Indeed Baggerly and Coombes did perform a “frequentist”-based assessment of this “Bayesian”-based methodology which did exactly that.

      A predictable reply from the authors (based on the reply to Baggerly and Coombes letter to JCO) will be “To reproduce means to repeat, using the same methods of analysis as reported.” However, good scientists know that reproducibility also means generally reproducible by similar methodologies. Indeed many argue that similar results should be obtained by related methods rather than by the exact same method – if related techniques cannot recapitulate an initial finding, for example a PCR experiment to verify a Western blot finding, then serious questions and doubt remain.

      An interesting development in the field of statistics during the 20th century was the gradual acceptance of Bayesian-based methods. Bayesian practitioners were somewhat ostracized in the earlier part of the century, but after some very talented theoretical statisticians pointed out some difficult problems encountered by practitioners of frequentist-based methods that could be addressed with Bayesian-based methods, the Bayesians were able to come out of the closet and develop much-needed methods that indeed continue to prove extremely valuable in modern statistics. Unfortunately, some practitioners flipped the other way, deeming Bayesian-based methods as somehow more inherently proper than frequentist-based methods and perhaps in response to previous ostracization began deriding frequentists. Such beliefs that one method is inherently superior to the other do not construe good science, but rather reflect interesting aspects of human behaviour.

      In truth, most statistical issues have a frequentist-based solution, and a Bayesian-based solution, and these solutions converge (comfortingly) to the same answer as sample sizes grow and grow (the solutions converge, asymptotically, as sample size increases towards infinity). Indeed, in many cases where a frequentist solution differs from a Bayesian solution, much valuable insight into deep statistical philosophical issues has been made.

      Thus, the frequentist-based assessment undertaken by Baggerly and Coombes of this Bayesian-based methodology, from which Baggerly and Coombes concluded “Using the scores that we compute, we do not see a large story here” suggest that even if the Bayesian-based methodology was corrected to avoid the statistical no-no of using ALL your data to derive “supergenes” it would not perform well. Baggerly and Coombes’ frequentist-based methods should yield similar outcomes to the Bayesian-based method, especially as sample sizes get large. If they do not (and this is research that I have not ever found published) then we have ourselves one of these frequentist-Bayesian discordant situations that represents a valuable learning opportunity.

      An unfounded and unproven assertion in the article cited above reads “One might suspect that the method just “stores” the given class assignments in the parameters . Indeed this would be the case if one uses binary regression for n samples and n predictors without the additional restrains [sic] introduced by the priors. That this suspicion is unjustified with respect to the Bayesian method can be demonstrated by out-of-sample predictions.” Herein lies the assertion that somehow this Bayesian-based method is not subject to the known statistical phenomenon of overfitting a model to a data set. Until a proper mathematical and computer-simulation analysis of this issue is performed, all of the assertions made in dozens of papers using this methodology remain on shaky grounds.

      1. Dr. McKinney – have you written to Duke or NCI about your concerns and asked that they have an INDEPENDENT review of all papers that involve this methodology? I have seen the document you sent to the IOM. In the best interests of science, you seem to have a very clear understanding of the pitfalls of such methods, but, it also seems that these methods have been used by the Duke investigators and still are!!. See Ginsburg GS et al recent papers (in Cell Host and Microbe etc). Please stay on this important topic and hopefully someone at Duke will have the common sense to evaluate all papers involving this methodology, rather than a ‘chosen few’ papers, which seem to be chosen by duke institutional officials or even Dr. Nevins himself, which would be scary because he clearly had several conflicts of interest in this work, as he was the founder of a company called Expression Analysis, which BTW, provided service to all of the genomics labs and all the projects at Duke, including the clinical trials at Duke!!.

      2. @BP

        Baggerly and Coombes did an amazing job corresponding with officials at Duke. The responses and actions taken by Duke officials are now well documented. Given that track record, I’m not certain that anything I could write to them would produce any different result.

        When I wrote to the Institute of Medicine, I was not yet privvy to the excellent work being done by people at the NCI, for example the fine work of Dr. Lisa McShane. NCI investigations were private at that point. After that first IOM meeting, documents were released detailing much of the NCI investigations into the Duke methodology. The NCI personnel had access to far more material than I could ever obtain, and the degree of competency with which they handled the material is evident in the IOM documents. At that point I realized that this issue was being well handled by very talented people, and I did not need to say anything more there.

        The IOM review necessarily needs to be more broad than just understanding the problems with an analytic paradigm at one particular institution. Other institutions will come up with other complex analytic systems, so the IOM needs to outline broad strategies to cover those as-yet unknown situations as well. This is why they are focusing on ways to require Investigational Device Exemption or something similar for analytics, not just for physical devices originally targeted by IDE regulations. Others have stated their dismay that the IOM is not focusing on Duke, but the IOM is wisely taking a broader view of the issue.

        I and others will continue to focus on the Duke situation to ensure that methodologies are better vetted, if not internally by Duke personnel then by us concerned outsiders, in order to continue to try to stop the unfortunate excessive expenditure of resources on as-yet unproven methodologies.

        @Pablo

        You are correct – better cross-disciplinary understanding is essential. Complex interdisciplinary technologies are here to stay.

      3. Dr. McKinney,

        First, I’ll filter out the kooky Bayesian-frequentist screed in your post that has nothing whatsoever to do with the issue at hand.

        That said, yes, Baggerly and Coombes did indeed identify problems with the manuscripts from Potti. However, you bizarrely focus on the methodology rather than the well documented problems with data mishandling and alleged data manipulation. What next, are you going to blame Microsoft because the analysis was done on a Windows PC?

        Bad Horse says neigh to your post.

      4. @Bad Horse

        “First, I’ll filter out the kooky Bayesian-frequentist screed in your post that has nothing whatsoever to do with the issue at hand.”

        This is rich, coming from someone unable to post his/her true identity.

        Labeling the two main branches of statistics as “kooky” is, well, kooky!

    3. Disaster it was in this particular case. An this type of combination is pervasive in science. I think that the lesson to learn is not that a technology-based collaboration should not be pursued, rather that it is important ot know the risks. For that, better education in the “other” field is necessary from both parts.

  2. T-tests comparing two groups might also be called a method in search of a question. Perhaps enrichment testing of array results is a better example.

    I think an interesting aspect of the case is how slowly and poorly the journals, the authors, and Duke responded to the situation after serious problems had been pointed out. It says something about review too of course.

    1. T-tests have a far better foundation than the methodology used in these retracted and other unretracted papers. Ronald Fisher, the early 20th century statistical genius who laid the foundation of likelihood theory, took Gosset’s T-test (developed in response to a question – comparison of batches of beer at Guinness – yum) and worked out properly the degrees of freedom, and the geometry underlying the T statistic. Distribution theory for the T statistic has by now a one hundred year track record of performance, so I would not compare this new method in search of a question to T-tests. Supergene statistics at this point must be taken on faith, as the proponents of the methodology have not yet shown proper error rates of the methodology.

      Bradley Efron, our current statistical genius who laid the foundation of bootstrap theory and much more, has reviewed the Duke methodology as well. It is telling that Efron’s response to review of this methodology was to (1) move Baggerly and Coombes paper into print in the Annals of Applied Statistics (of which Efron is editor in chief) and (2) add his name to the letter sent to Harold Varmus on July 19, 2010 signed by 33 statisticians imploring the NCI to stop the trials then underway using this methodology. Quite a different picture to Fisher’s response to Gosset’s methodology.

      1. Supergene methods can be done just fine by some groups. Hope I understand you correctly, if not, I apologize.
        If you have a closed-box test set you can do anything you want, dumb or smart. Completely blinded test sets however, have a very interesting moderating effect on statistician behavior – they make you very scared of over-fitting. They don’t care how fancy your methods are, nor what your intentions might be. They don’t care about your faith in your method, they just judge it, gravely. Cross-validation can be done quite fairly too if you are careful, though there are 50 ways to help it along that aren’t strictly proper, and reader and reviewers need to be awake to that.

        The well known example for me:
        In the Shedden lung paper (PMID:18641660), Shedden’s methods did pretty well compared to what other people tried, and he first reduced to 100 meta-genes by clustering all the genes using a K-means clustering algorithm, and using the average of the genes in a cluster (after standardizing each gene) as the “meta-gene”. He then used ridged Cox models to related survival to these meta-genes (he just calls them “features”, perhaps disdaining cool-sounding bio-speak). Details around page 58 of the supplement. There was not just one, but two closed-box test sets, of assays performed at two different institutions. The results are not super-fantastic, perhaps simply cause it’s a hard problem and the test sets really were held back (I have a pet theory about why lung cancer is particularly hard – it’s not mets from the resected early stage tumor that kill you, it’s a different tumor, which wouldn’t happen if we removed all of both lungs, but patients don’t like that treatment option much). I am a co-author (the 26th I think), and so am likely biased, but Kerby Shedden and Jeremy Taylor make me look like a dwarf, I assure you, so I am more like just a cheer-leader. I don’t doubt principal components regressions have been done properly many times too.

      2. You are right, Rork, other groups such as Beer’s group in Michigan, with which you were involved, have shown much better handling of such methodology. My references to “supergenes” were specifically to the Bayesian-based methodology developed by Mike West at Duke and used in so many of their published papers. I do not see the word “supergenes” in Beer et al’s 2002 paper “Gene-expression profiles predict survival of patients with lung adenocarcinoma” – rather they use “50-gene risk index” and “expression signatures” in their follow-up validation study in 2008 “Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study”. As you say, the results in the 2008 validation study “are not super-fantastic”, though they show an honest evaluation of a method with some potential. This is the issue with the dozens of Duke papers, which consistently show “super fantastic” ROC curves and other measures of classifier performance. Such repeatedly optimistic findings suggest a problem with over-fitting and improper performance assessment – if it looks too good to be true, it probably is. The dozens of ROC curves shown in the supplementals of the Shedden, Beer et al. 2008 paper look like a reasonable set of ROC curves that I’d expect to see in such a study – not all of them (or indeed very many at all) showing area under the curve close to 1.0, as claimed in paper after paper by the Duke groups.

        I agree that completely blinded test sets have a moderating effect on the behaviour of statisticians – they don’t make me scared of over-fitting, they protect me from over-fitting by judging honestly, without regard to faith in a method. Faith in a method does not constitute good science, and faith should be gravely tested before patients are exposed to potential harm by any method.

  3. Where did the peer review process fail before these manuscripts ended up in print? Why is Potti still getting published as late as 2012? As a PI, Nevins should have looked at the raw data before allowing his name to get put out with the trash. And what about papers that are not currently retracted but base arguments from references that have been retracted. For instance PMID18800387 reference PMID21366430. I’ve witnessed data falsification in every lab I’ve worked in except for my current position at Michigan, and it pisses me off so much when I hear about it/find out about it. In my opinion, Potti should not be practicing medicine in the US and should be deported to India. Nevins should be in jail.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.