Funding agency sanctions Bulfone-Paus and former postdoc

Silvia Bulfone-Paus
Silvia Bulfone-Paus

Retraction Watch readers may recall the case of Silvia Bulfone-Paus, a researcher at Germany’s Research Center Borstel who was a frequent subject of posts in the early days of this blog. Bulfone-Paus has had to retract 13 papers amid investigations into allegations of image manipulation.

To briefly recap: In May 2010, several months after concerns had first been raised, Borstel let the DFG (German Research Foundation) know about the allegations, because they had funded the work. A November 2010 report from Borstel said that the allegations had merit, blaming two of Bulfone-Paus’s postdocs but criticizing how she supervised them. As the DFG notes in a summary of its findings on the case, posted late last week:

…in February 2011 the DFG Committee of Inquiry on Allegations of Scientific Misconduct recommended that the researcher not appoint any new research personnel to DFG-funded projects until the DFG had concluded its inquiry and that she withdraw from her position as project leader. Bulfone-Paus had already voluntarily suggested that the DFG should not appoint her as reviewer nor as a member of any DFG committees for a period to be stipulated by the DFG.

The DFG has now closed the case, and has imposed sanctions:

The DFG Committee concluded that Bulfone-Paus had committed “gross negligence of her supervisory duty” in her function as the leader of the working group and was therefore guilty of scientific misconduct as stipulated in the DFG procedures. On the basis of this result, the Joint Committee of the DFG decided to issue Bulfone-Paus with a written reprimand, to prohibit her from submitting proposals for three years, and to exclude her from statutory bodies at the DFG and not to appoint her as a reviewer for three years. However, since Bulfone-Paus had voluntarily suggested at the start of the proceedings that she withdraw from her appointments, not be appointed as a reviewer, nor be included in statutory bodies, the Joint Committee decided that this period should count towards the measures taken, leaving only the issue of the written reprimand to continue in effect.

“These measures represent a suitable and appropriate means of reprimanding Ms. Bulfone-Paus for the sustained neglect of her supervisory responsibility towards the early career researchers. As an experienced researcher, Ms. Bulfone-Paus did not fulfil the essential function of providing a good role model for her colleagues,” said the DFG Secretary General, Dorothee Dzwonnek.

The DFG also closed its case involving Elena Bulanova, a former postdoc in Bulfone-Paus’s lab:

On this basis, the Joint Committee of the DFG decided, as recommended by the DFG Committee, to issue Dr. Bulanova with a “written reprimand” and to prohibit her from submitting proposals for five years. This outcome is intended to serve as an unmistakeable reminder that data and research results must be handled correctly and carefully. “Falsification of research results in publications is inexcusable and cannot be justified under any circumstances,” stated DFG Secretary General Dzwonnek. “A ban on submitting proposals for a significant period of time is therefore appropriate.”

Other investigations into Bulfone-Paus’s work continue, at the Borstel and at the University of Lubeck, where she and her husband have both held faculty positions. The Charite in Berlin is also looking into her research.

Read more details at Der Spiegel.

18 thoughts on “Funding agency sanctions Bulfone-Paus and former postdoc”

  1. According to Spiegel Online, the two (Bulfone-Paus and her husband Paus) are fighting sanctions proposed by the University of Luebeck (generally speaking–this is an imprecise translation of apparently very detailed rules at the U.) This scandal has been generating smoke for two years or more.
    Apparently, the pair were involved in some data manipulation(?) according to the U of L but confidentiality rules bar further details from being revealed (at least until they have had a chance to defend themselves.)
    It appears that the two graduate students who were said to be faking Western Blots (!! yes !! more WB’s!!) learned their craft from their supervisor… rather than the supervisor just being lax, as DFG seems to be saying. The punishment seems rather mild for what seems to be pervasive fraud. Worse, the DFG seems to be trying to reassure the scientific community that the basic findings of the retracted papers are valid– despite the faked WB’s.

    There’s a lot more to this story than we have been privy to so far.

    1. As I recall, Bulfone-Paus had to retract a paper in which neither of the two Bulgarians was a co-author. That has made her the only common denominator within the bundle of retractions originating from her lab. It would be great to know how she explained this.

    2. I think it is very problematic to count Bulfone-Paus’ voluntary withdrawal towards the sanctions imposed on her. In effect there is no punishment at all.

    3. Forschungszentrum Borstel has more details on their website: Apparently only 4 out of 13 retracted studies were DFG-funded, therefore the other 9 studies were not part of the DFG investigation. Borstel points out that the disciplinary enquiry of University of Lübeck is not yet completed.

      Reference (in German):

  2. Today a student that is doing a project in the lab showed me some excel files and I immediately thought the error bars looked strange (they looked identical). She happened to have selected the same SD for all experiments in the same series, so we corrected that of course. She is still in college and is not that familiar with excel. However, I remember that I saw something similar when I did sleuthing of some of the papers of Bulfone-Paus last year. It can happen, but very unlikely. All unretracted.
    Fig. 1
    High-dose proinflammatory cytokines induce apoptosis of hair bulb keratinocytes in vivo.
    Rückert R, Lindner G, Bulfone-Paus S, Paus R.
    Br J Dermatol. 2000 Nov;143(5):1036-9. PMID: 11069516

    Fig. 7
    Blocking IL-15 prevents the induction of allergen-specific T cells and allergic inflammation in vivo.
    Rückert R, Brandt K, Braun A, Hoymann HG, Herz U, Budagian V, Dürkop H, Renz H, Bulfone-Paus S.
    J Immunol. 2005 May 1;174(9):5507-15. PMID: 15843549

    Fig. 3
    IL-15-IgG2b fusion protein accelerates and enhances a Th2 but not a Th1 immune response in vivo, while IL-2-IgG2b fusion protein inhibits both.
    Rückert R, Herz U, Paus R, Ungureanu D, Pohl T, Renz H, Bulfone-Paus S.
    Eur J Immunol. 1998 Oct;28(10):3312-20. PMID: 9808200

    1. In reply to Junk Science December 13, 2012 at 5:30 pm

      Here are my comments on highly unlikely statistics in 2 BP papers:-

      1. Eur. J. Immunol. 1998. 28: 3312–3320
      IL-15-IgG2b fusion protein accelerates and enhances a Th2 but not a Th1 immune response in vivo, while IL-2-IgG2b fusion protein inhibits both Rene´ Rückert1, Udo Herz2, Ralf Paus3, Daniela Ungureanu1, Thomas Pohl1, Harald Renz2 and Silvia Bulfone-Paus1


      Figure 2 top panel, going down the panel I notice that.

      The error bars for the OVA + IL-15-IgG2b are all the same (3 out of 3 data points).
      The error bars for the OVA + IL-2-IgG2b are all the same (3 out of 3 data points).
      The error bars for the OVA are all the same (3 out of 3 data points).

      Next panel down.

      The error bars for the OVA + IL-2-IgG2b are all the same (3 out of 3 data points).
      The error bars for the control are all the same (2 out of 2 data points), we cannot see the error bar for the day 7 reading.
      The error bars for the OVA are nearly the same (3 out of 3 data points).

      Lower panel.

      The error bars for the control are all the same (3 out of 3).
      The lines for the diamonds (OVA + IL-2-IgG2b) and the squares (OVA + IL-15-IgG2b) and the error bars are obscured.
      the only error bars which we know are different are the day 7 and day 10 of the OVA (inverted triangle) line.

      Figure 3.

      The error bars for all the “medium” (open columns) are the same ( 6 out of 6).
      The error bars for all the “OVA” (black columns) are the same ( 6 out of 6).

      Figure 4.

      Left panel.

      All the error bars for the control (open columns) are the same (6 out of 6).

      Right panel.

      5 out 6 of the error bars of the control (open columns) are the same.
      The error bars for all the “OVA” (black columns) are the same ( 6 out of 6).

      2. Eur J Immunol. 2003 Dec;33(12):3493-503.
      Dendritic cell-derived IL-15 controls the induction of CD8 T cell immune responses.
      Rückert R, Brandt K, Bulanova E, Mirghomizadeh F, Paus R, Bulfone-Paus S.

      Figure 1A.
      The error bars of the first 3 rows are essentially the same.
      The error bars of the 4th and 5th rows are essentially the same.
      Figure 1C.
      The error bars of the first 3 rows are the same.
      The error bars of the 4th and 5th rows are the same.
      Figure 2A. The error bars are the same.
      Figure 2B. The error bars are the same.
      Figure 2C. The error bars are minute. The error bars for the IL-6 columns are obscured by the heading.
      Figure 3B.
      2 of the 3 error bars on the IL-15 +/+ series are essentially the same.
      2 of the 3 error bars on the IL-15 -/- series are the same.
      I believe these occurrences are very unlikely to happen by chance.

  3. Subject: Dermatology Open Access Initiative

    Dear Dr. Ralf Paus,

    In a publication entitled “Thyroid-Stimulating Hormone, a Novel, Locally Produced Modulator of Human Epidermal Functions, Is Regulated by Thyrotropin-Releasing Hormone and Thyroid Hormones”, published 2010 in the journal endocrinology you are named as correspondence author. I am interested in investigating statistical anomalies in specific dermatology papers. Therefore, I would like to ask you whether you could tell me who I should contact to get the original data of figure 2i?

    The figure legend “Mean ± SEM. ***, P < 0.001; *, P < 0.05, n = 1. Gene expression of K5, K14 involucrin (Inv), and loricrin (Lor) was detected by quantitative real-time PCR (I)." is somehow irritating due to the small number of samples. Maybe that the small variance (0.06% and 0.04%) can be explained by this small number of samples (n = 1):
    "This appeared to be primarily a transcriptional effect of TSH because quantitative real-time PCR revealed that the intraepidermal steady-state mRNA levels for involucrin were also highly significantly up-regulated (361.7 ± 0.06% compared with the involucrin transcript levels of vehicle treated control epidermis) (Fig. 2I). In addition, we examined transcriptional changes of another terminal differentiation-associated keratinocyte marker gene, loricrin (33). Again, TSH treatment of organ-cultured full-thickness human scalp skin highly significantly (357.69 ± 0.04%) up-regulated loricrin transcription (Fig. 2I)."

    Was it really the Mann-Whitney test that was applied?

    Best regards,
    Markus Kuehbacher

    1. Good grief, how can anyone believe a number like +/- 0.06% for a qPCR test?
      More and more, I get the feeling that reviewers just don’t know or care about how technical details of the stuff they are supposed to be evaluating; qPCR, Western Blots, flow cytometry, stats – it just all seems to be ignored by these people.

  4. Subject: Another FASEB article with interesting statistical results

    Dear Ralf,

    The third paper that I have read now, is not only provoking serious questions but offers above and beyond an answer to those questions, which I have addressed to you in my recent emails before. The article “Thyrotropin releasing hormone (TRH): a new player in human hair-growth control”, published 2009 in the FASEB journal, is not too old for having a look into the primary data. I would like to ask you: What p-value is possible when applying the Mann-Whitney test for n1 = n2 = 3?

    This statistical test has been “mentioned” quite often in your papers; therefore I assume that you must be familiar with this nonparametric test. Otherwise you can find some information here:

    The reasons for my question are those symbols for significance used in quite a lot of the presented figures in this article as results of such Mann-Whitney tests. For example, in the text legend of figure 7 you wrote: “Columns represent means ± se; cumulative results of 3 consecutive experiments. *P < 0.05, ***P < 0.001 vs. control; Mann-Whitney test." As you may know, applying the Mann-Whitney test for n1 = n2 = 3 can cause problems.

    Best regards,

    PS: In my next email I would like to address the problem of asymmetric error bars.

  5. Subject: More questions about your research

    Dear Ralf,

    Many thanks for your prompt reply. Whilst you go some way to allaying my concerns I’m afraid that my investigations of the same paper have thrown up some more queries which I hope you will be able to answer by return.

    Figure 3 A: Please could you be so kind as to explain how two means (you state that they are SEMs), with the value of 0%, both possess error bars. It would seem that this could only be mathematically possible if you had both positive and negative values for the numbers of hair follicles at a particular stage. Anybody would agree that a negative value for hair follicles makes no sense. Please could you explain?

    Figure 3 B: I have undertaken a calculation of the hair cycle scores (HCS), using the values you have presented in figure 3 A and found them to be 396, 215, and 400, which nicely reproduce the results shown in figure 3 B. However, I note that you get the same values only after division by the number of distinct hair cycle stages (n = 5) so I am at a loss to understand how you achieved these values after dividing by hair cycle stages. I would be grateful for an explanation. This also begs the question about the number of hair follicles that were categorized. Again, an explanation would be useful. In ending we also have the now common problem that 3 out of 3 error bars of the same size, which furthermore do not match with the variance shown in figure 3 A.

    Figure 6: Related to my question about the “missing” percentage, that you have kindly answered, I am bound to say that I am still confused about the text in the figure legend where you state: “Whereas hair follicles in vehicle-treated control skin were mostly in anagen VI”, which appears to contradict the results shown in the graph. Do 8 out of 9 error bars have the same size in the graph? Some clarification seems in order.

    As you me see, I am really interested in numbers and am sorry to say that some of your data has left me confused and, frankly, quite irritated. To assuage my irritation I have been moved to read another of your papers, hoping to show that the confusion of the first is simply a “one-off”. The second of your paper that I have read, provoking more serious questions, is entitled “Stress Inhibits Hair Growth in Mice by Induction of Premature Catagen Development and Deleterious Perifollicular Inflammatory Events via Neuropeptide Substance P-Dependent Pathways” (see Attachment). I have checked the given numbers statistically, with remarkable results. For that reason I wondered if you could kindly provide me with the original data files – I note that the data is not older than 10 years. As a preliminary question, do the numbers (means and SEMs) in Table 1 really correspond with the presented graphs (means and SEMs)?

    Once again, many thanks for the prompt response and I hope to hear from you by return.

    Best regards,

  6. Subject: Original data – Incidit in Scyllam, qui vult vitare Charybdim

    Dear Ralf,

    “Cum videam reprobos opibus affluere, dominari vitia, virtutes succumbere, vilipendi feminas, viros autem nubere, difficile nobis est satiram non scribere.”

    Best regards,

    PS: “Multiformis hominum fraus et iniustitia, letalis ambitio, furtum, lenocinia cogunt, ut sic ordiar conversus ad vitia: quis furor, o cives, que tanta licentia!”

  7. The difference between sport, undergraduate degrees and research is interesting.
    Lance Armstrong risked his life cheating and eventually lost his medals and his prize money.
    Undergraduates can lose their degree.
    In research you are put on the “naughty step” and then can continue.

  8. Where’s the beginning of a thing? Solving that would at least give a satisfying answer.

    Der Spiegel now reports:

    “Fünf der dreizehn zurückgezogenen Studien sind am Berliner Universitätsklinikum Benjamin Franklin entstanden, das nun zur Charité gehört, oder direkt an der Charité.”

    “Five of the thirteen retracted studies originated in the Berlin university medical center Benjamin Franklin which belongs now to the Charité, or directly in the Charité.”

    It has recently come to light that a senior professor, and co-author on 4 other publications with Bulfone-Paus, Harald Stein,
    at the Universitätsklinikum Benjamin Franklin also has a somewhat strange publication from that time.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.