So when is a retraction warranted? The long and winding road to publishing a failure to replicate

Sometime in 2009, the University of Nottingham’s Uwe Vinkemeier thought something was wrong with two papers he read in Genes & Development, one from 2006 and one from 2009. The papers claimed to show how changes to a protein called STAT1 affect programmed cell death. So he did what scientists are supposed to do: He tried to repeat the experiments, to replicate the results.

He couldn’t.

So he submitted the results to G&D, which was initially willing to publish the data along with a rebuttal by the original authors. But everyone seemed to be dragging their feet.Meanwhile, Vinkemeier presented the data in a poster at the JAK-STAT meeting in Vienna in February 2010. The last author of the studies Vinkemeier’s team was questioning, Thorsten Heinzel, of the Friedrich-Schiller-University of Jena, was also there. Heinzel gave the poster a look, suggested a few reasons why the results might not be the same, and offered to send some samples to Vinkemeier’s team.

Vinkemeier requested some of those samples — in this case complementary, or cDNA, from which the authors had made the STAT1 proteins they tested — on the spot. But none came. Finally, in August, G&D, which was still mulling the manuscript, suggested Vinkemeier’s team test Heinzel’s team’s’ cDNA. We’d love to, Vinkemeier said, but they hadn’t sent it.

cDNA did arrive in Vinkemeier’s lab a few weeks later after G&D’s suggestion, but it “was not labelled as it should have been,” and it didn’t seem to be any of the material described in the papers, so the team declined to use it. Then, a few weeks later, they got what seemed like the right stuff.

They sequenced the cDNA, and found three mutations that the original paper hadn’t described. One of them changed the protein sequence in an important way.

By this time, Vinkemeier had tired of the foot-dragging at G&D, and had submitted the paper to Molecular and Cellular Biology, which published it on May 16. There, Vinkemeier and coauthors Filipa Antunes and Andreas Marg wrote:

…the results presented here conclusively exclude the possibility that STAT1 signaling is regulated by a phosphorylation-acetylation switch proposed by Krämer et al.

They conclude:

As reported here, in spite of our best attempts to exactly reproduce the previously reported experiments of references 22 and 23, we found no experimental confirmation of the results presented there. We are unable to provide a scientifically acceptable explanation for the complete dichotomy of our findings with respect to those reported by Krämer et al.

We asked Heinzel and first author Oliver Krämer for their response to the MCB paper:

We would like to stress that several key findings of our laboratory that were criticized by Dr. Vinkemeier (e.g. STAT1 acetylation and the effect of HDAC inhibitors on STAT1 phosphorylation) have been independently published by other groups, some before and some after our works appeared. In our opinion this puts the failure to reproduce our data in Dr. Vinkemeier’s lab into perspective.

We also point out that Dr. Vinkemeier did not follow our published experimental protocols in all details. He likewise did not contact us directly to resolve technical issues. However, the Editorial Office of Genes&Development contacted us last year and early this year to inform us about Dr. Vinkemeier’s criticism. We submitted original scans of blots, additional information and new data to the Editorial Office to clarify this issue.

We are preparing a manuscript which extends our findings and also responds to the Antunes et al paper.

We asked Heinzel and Krämer for the references of papers that had replicated their results, but haven’t heard back.

We don’t know if the original papers should be retracted. That’s best left to experts in the field. The editors of G&D, for example, don’t seem to think that a retraction is warranted. (They haven’t responded to our requests for comment.) But we thought this was a good story to show what happens when someone tries to replicate findings, and start a discussion on Retraction Watch about what should happen next in such cases.

The comment thread, as always, is open.

Thanks to Jeff Perkel for help interpreting the papers.

29 thoughts on “So when is a retraction warranted? The long and winding road to publishing a failure to replicate”

  1. Quoting another Retraction Watch reader, if I wore a hat I would take it off for the Vinkemeier group. Going up against unreproducible results is hard, to say the least, and often unrewarding. Scientifically, I think they have rather strong points against the Krämer/Heinzel manuscripts, too. Shame on G&D for slowing down the process. This is inacceptable.

  2. It is a serious dereliction for you not to make exquisitely clear that a failure to replicate is a far, far different thing from evidence of intentional fraud. In the context of this blog, you leave a very inaccurate impression that it is the failure to reproduce prior results that is the major factor. It IS a factor in whether that prior report is likely to be *valid*, but not whether fraud was involved. Papers that were wrong, for whatever reason short of intentional fraud, should not be retracted.

    1. Oh yes, in this case I think they should be retracted. Even the mutated cDNA sequences should warrant a retraction since obviously this was sloppiness, not science. In my book, science is the pursuit of truth. Don’t get me wrong, if there was no way to detect artefacts, authors cannot be blamed for publishing results that turn out to be wrong. If there was a way as simple as sequencing plasmids, well, they should definitely have done that. And this is not even taking into account that the Vinkemeier group took a great deal of effort to deconstruct many of the results published in G&D, this time even using the correct cDNAs. Another good reason to retract those papers is that, generally speaking, people base their careers on publishing studies like this, thereby taking away job and grant opportunities from people who do it correctly.

    2. Thanks for the comment. For the record, this post says nothing about fraud, and is all about the failure to replicate.

      We have to respectfully disagree that fraud is the only reason to retract. Based on these posts, we’re not the only ones who feel that way:

      The Committee on Publication Ethics also agrees:

      Right at the top, COPE’s retraction guidance says:

      Journal editors should consider retracting a publication if:
      • they have clear evidence that the findings are unreliable, either as a result of misconduct (e.g. data fabrication) or honest error (e.g. miscalculation or experimental error)

      We appreciate that others may differ, and that was the point of this post, to get that discussion going.

    3. I disagree strongly that only intentional fraud should lead to retraction. Ivan already provides some examples of non-fraudulous papers that are retracted, because the content is just plain wrong due to an error. Elsevier will explicitely remove articles if an error is found that may have consequences for human health.

      Now, the case here is different in the sense that one other group cannot reproduce the results. Without wishing to take sides here, one group being unable to reproduce results of another is not sufficient to claim error and discuss retraction.

  3. In September 2008, BD Biosciences removed monoclonal antibody 8H1 against human C3a receptor from its catalogue. They did so only after I engaged the ombudsman at the Medical School of Hannover/Germany. I had found out that 8H1 heavily cross-reacted with unknown epitopes and that the lab which established the monoclonal antibody had been fully aware of its unspecific binding for quite a long time since a comprehensive description of it was contained in a German thesis (2000). In at least 5 publication from different groups around the world, 8H1 was used for the immunohistochemical characterization of C3aR. Those results were highly questionable or outright wrong as I found out with our own, newly established antibodies. A manuscript containing our contrasting findings was rejected until I contacted an old friend, now editor-in-chief….When I informed the authors of these five papers that antibody 8H1 was no longer available due to proven unspecificity and that their published findings required independent confirmation, only one answered that his lab was no longer engaged in C3aR research. The others didn`t even care to reply. The ‘Abnormal Science Blog’ covered the story (in German) (

    1. For me, science is all about some big words. Truth, honor, honesty, responsibility, credibility, to name a few. These virtues can only be achieved through education and very high personal standards. Can people who blatantly do not care about said virtues be called scientists? It tells a lot if somebody does not promptly and sufficiently react to allegations of artefacts or fraud emanating from their labs. Trying to bring forward one’s own career at all cost is short-sighted, to say the least. If the big words are going down the drain, what’s left? If one paper from a given lab is fishy, I do not take the time to consider the rest to be valid, because the authors obviously don’t care about virtues a scientist should have.

  4. I initially responded on DrugMonkey’s post and then decided I had more to say. The tl;dr version: Retractions are not just meant for fraud. Lack of reproducibility does not necessarily call for retraction. Maybe one day as scientific publishing in the digital age matures, we’ll have other ways of dealing with such subtleties.

  5. I still don’t see that they have provided evidence of a definitive error. Even then, if there was an error, that error is extraordinarily informative. A change in base pair provided functionality that a protein didn’t have before? That’s really fascinating.

    I agree with DM that this does not seem to warrant a retraction. Just more investigation and experimental data from other scientists.

    1. Isis, have you experienced a situation in which you wasted time and money on trying to reproduce published data and found out that it is simply unreproducible? I guess not. It will make you think different. And why reward sloppiness? Changes in protein sequence are a serious error. This was not described in the G&D manuscripts. As such, they do not reflect the real situation. They should be retracted in order not to mislead other investigators, to say the least. Responsible behaviour is getting out of fashion, it seems. A scientist should be ashamed to have his or her name connected to a manuscript which is obviously flawed. I realize that many lab people don’t agree and simply go on. Probably that’s what will happen in this case. Journals have commercial interests, retractions decrease their success. Thorsten Heinzel is the dean of the biology department of the University of Jena. Who should put pressure on him? And why should the university damage its reputation by starting an investigation?

  6. Isis, have you experienced a situation in which you wasted time and money on trying to reproduce published data and found out that it is simply unreproducible? I guess not.

    You don’t know that.

    They should be retracted in order not to mislead other investigators, to say the least. Responsible behaviour is getting out of fashion, it seems.

    We’d better go back and retract all that stuff Bohr and Haldane did then. Lest anyone be misled.

    You’re making a lot of assumptions about the outcome of this case, tk. Given that this is remains unresolved, why do you assume negligence or fraud?

    BTW, ping.

    1. Isis, you are totally right. I don’t know about your experience, that’s why I am guessing.

      Please, do not compare Bohr to this case. In my opinion, the situation is different. Also, I would like to point out that I did not imply fraud. There is no evidence for that. However, having cDNA inserts of plasmid vectors sequenced is a zero-difficulty low-cost thing Krämer et al. failed to do. The term “negligence” would describe what they did in this respect. This is not even considering the nonreproducibility reported by the Vinkemeier group.

      One question that comes to my mind is whether you (and other commenters) actually read the Heinzel manuscripts and the recent Vinkemeier manuscript? This is not meant harshly but aims to kick off thinking by providing insight and novel views. The details of each case, including this one, are often crucial.

  7. Inability to reproduce data by itself doesn’t warrant a retraction. There are plenty of times when similar experiments have different results, often for legitimate reasons. However, this case certainly raises the possibility — or perhaps likelihood — of error in the original paper. If that’s the case, then the new findings should be published and the original paper retracted so that others do not base their work on faulty research.

  8. To my mind, retractions for error are something of a slippery slope. Scientists are wrong all the time; there’s hardly a paper that doesn’t have some kind of error in it, though the error may have no impact on the conclusion.

    When should a scientific error result in a formal retraction? Perhaps we should reserve retraction for an error that is both well-understood and has an impact on the conclusion of the paper. By this logic, failure to replicate is insufficient reason to retract until scientists elucidate what error led to the failure to replicate. If used in this way, retractions would actually advance the science.

    Failure to replicate can be based on many things, including legitimate differences in experimental procedure (e.g., subtle differences in detailed methods, source of reagents, failure to control unknown or unspecified variables, etc). Journals never give enough space that authors can describe every method in sufficient detail to enable someone else to replicate, so some failure to replicate is expected.

    I think we want to avoid a situation where an idea falls out of fashion (e.g., XMRV virus and chronic fatigue syndrome), then authors are pressured to retract in the absence of clear evidence of error or fraud.

  9. A very instructive case: G&D reminds of commercial rating agencies like fishy S&P or Moodys, apparently neutral but not streightforward engaged in transparency and scientific integrity. This really corrupts their reputation, therefore I fully agree to tk: Shame on them for slowing down the process.
    And by the way: What about the funding agency and the university, Jena University in this case?? They should be eager to bring transparency und integrity back into the race. If all this players allow establishing grey areas, and give space to fishy results like those of the Heinzel group, we all have a problem.

  10. Let’s ask an evil question: so far it appears that most people here are assuming that the Heinzel group did something wrong.

    What if Vinkemeier is wrong?

    1. I would not call this question evil, but naive. This does not mean bad, asking naive questions is what often provides new angles of thought. My thoughts on this are:
      – Vinkemeier is not wrong by definition, the group could just not reproduce the Krämer/Heinzel results.
      – The group obviously could reproducibly not reproduce the Krämer/Heinzel results.
      – Heinzel failed to help clarifying the reasons for nonreproducibility in the beginning.
      – Later on, after some difficulties with labeling of tubes, he sent reagents which were found to be flawed in some way. Why did Krämer/Heinzel not validate their reagents?
      – Why should a scientist not be highly interested in helping others to reproduce his results? Why delay the process?
      – It takes considerable energy to generate and publish a manuscript which simply negates previous findings. Why did the Vinkemeier group choose this approach, rather than letting it go, if they were not fully convinced that the Krämer/Heinzel results do not reflect the real situation?
      – My last point is not a thought, but an open question to all readers. Did anybody try to reproduce the Krämer/Heinzel results from the two G&D manuscripts? I did not. One group I know did, and they failed. As I understand, they did not do as many experiments as the Vinkemeier group. Still, it is only one group in addition to the Vinkemeier lab, so n=3, two report negative outcome regarding the claims made in the G&D papers.

      Taken together, I think that an independent investigation would be the best option. Also, I would be interested which other groups reproduced the Krämer/Heinzel results. The statement that “We asked Heinzel and Krämer for the references of papers that had replicated their results, but haven’t heard back.” is quite devastating.

      1. Being “fully convinced” of something doesn’t mean you’re right. My sense of Wakefield is that he is still fully convinced that his ideas are true, though there is incontrovertible evidence of the fraud involved in publishing them.

  11. The fact that more than one lab was unable to reproduce he findings, taken together with Heinzel´s obvious lack of cooperativity to resolve the problem, makes the whole issue very suspicious. In the interest of science and their own credibility, G&D should retract the paper.

  12. This situation is not easy to asses. But consider a peer reviewed paper in a journal of pure mathematics. Suppose the paper contained a faulty proof of a theorem, not easy to spot and so it went years without detection. Should the paper be retracted? Should an errata or eratum be published? How would yoi feel if you contacted the journal and they did not care to do anything at all, aside from telling you to contact the author. Suppose you do and the author does not reply.
    You as a good researcher you decide to ignore the faulty results and move on and find a journal to publish your work, though it does not reference the flawed results of that paper. Things get tough, tough for progress.

  13. Failure to reproduce can occur for subtle and non-obvious reasons that have nothing to to with misconduct and everything to do with the fact that we cannot, ever, control every single potentially important variable. A classic example: the use of plastic vs. glass tubes to assay heterotrimeric G protein activation:
    “Original observations with detergent extracts of plasma membranes from S49 cels [sic] suggested thatactivation of G/F by F- required or was markedly stimulated by nucleotide (5,7) . At this time it was admitted that the nucleotide specificity was unclear (and variable) . When these experiments were repeated with purified G/F from rabbit liver, the requirement for nu- cleotide was sporadic and the specificity was ill-defined. Further study of the activation thus required stabilization of the requirement for nucleotide. This was eventually achieved by observing two restrictions: (i) the use of glass-distiled water to make up all reagents used for activation and (i) manipulation of all solutions and activation reactions in plastic tubes or acid- washed (and extensively rinsed) glass tubes. Under these conditions Mg2+ and F- support only minimal activation of G/F (Fig.1). If ATP was then added to the incubations, a clear, time-dependent activation of G/F was observed…

    Thus was a (repeated) failure to reproduce converted into a powerful suite of biochemical tools:
    “I do not know who decides what should be the biological molecule of the year. For 1997, I vote for aluminum fluoride. The choice of this small inorganic molecule may seem strange, but this year a number of reports on its use have appeared, with far-reaching consequences for our understanding of some very fundamental processes in biological systems.”

  14. Following up on a comment on a different Retraction Watch article ( ), there are some strange, very straight boundaries between some lanes in figures from aforementioned papers and another one I came across by chance:
    Krämer…Heinzel G&D 2006 Fig. 1B
    Krämer…Heinzel G&D 2009 Figs. 4A,H
    Krämer…Heinzel The FASEB Journal May 1, 2008 vol. 22 no. 5 1369-1379 ( ) Figs. 1A, 2E, 3C, 4A,B,C,F, 5B, 6B
    In general, it is completely OK to remove some unnecessary lanes, but — if an experiment was reproduced several times, why not choose a gel loading scheme which makes image editing unnecessary? Also, in several instances, there is no possibility that the loading control was taken from the same membrane since the loading control was not edited. In my opinion, this approach is not good practice. Please have a look at the figures I mentioned and see for yourself whether you agree with me.
    Please note that this does not imply fraud.

  15. OK, n=4, we tried to reproduce experiments from the two Kramer papers and we could not. We contacted them, and they made some suggestions, but it still did not work. We then requested the glutamine substitutions mutant and the wt control plasmid. They were sent to us and although we would get some transformants, we would get completely miserable yields from maxipreps, like a few micrograms on several tries. Tired, I engineered the mutations in my STAT1 constructs, fully sequenced them and tested them. Again, we could not reproduce the data, completely clear cut. I checked the crystal structure of STAT1 bound to DNA and saw that the two lysines were involved in the interaction with the DNA, which raises the questions how could they become acetylated by CBP in the first place and how can they not interfere with DNA binding. Kudos to Vinkemaier team and shame on G&D

  16. Genes & Development, the journal that for the longest time managed to ignore any wrongdoing on its pages, has been hit hard by two retractions in short succession. What went wrong for the editors who work tirelessly to uphold all sorts of data, however irreproducible and hopelessly flawed it might prove? The answer is simple: the editors were betrayed by their authors, who stabbed them in the back by admitting to fraud. Thus, for no fault of their own, G&D’s pages have now been soiled by the nasty R-word. To make sure such regrettable transgressions do not repeat themselves, this CALL FOR PAPERS is issued to rally reliably inventive and creative characters as authors like those whose works are featured on this blog. Submission of profoundly unique results of highly unusual significance in any area of theoretical experimental biology is strongly encouraged. Presubmission inquiries will be handled with the appropriate discretion; and postpublication expressions of doubts will receive the unwavering ignorance our authors have learned to rely upon.

    1. After having seen the reactions of several well-established life science journals, I have no hope left that they have any interest in publishing the truth. Which means that they are not interested in science. Personnel often do not even have any scientific education, yet decide on matters like these. Journal employees who are able to understand what is going on should know better than working towards their own demise — with each and every case, it becomes clearer that the journal system has to vanish, being part of the problem, not of the solution.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.