About these ads

Retraction Watch

Tracking retractions as a window into the scientific process

Anatomy of an inquiry: The report that led to the Jens Förster investigation

with 239 comments

forster-j-a

Jens Förster

We have obtained a copy of the report that led to the investigation of Jens Förster, the social psychologist at the University of Amsterdam, which is calling for the retraction of a 2012 article by the researcher for manipulated data.

As we reported earlier, Förster has denied any wrongdoing in the matter.

But as the report makes clear, investigators caught several red flags in Förster’s work. Here’s the abstract, which makes for interesting reading:

Here we analyze results from three recent papers (2009, 2011, 2012) by Dr. Jens Förster from the Psychology Department of the University of Amsterdam. These papers report 40 experiments involving a total of 2284 participants (2242 of which were undergraduates). We apply an F test based on descriptive statistics to test for linearity of means across three levels of the experimental design. Results show that in the vast majority of the 42 independent samples so analyzed, means are unusually close to a linear trend. Combined left-tailed probabilities are 0.000000008, 0.0000004, and 0.000000006, for the three papers, respectively. The combined left-tailed p-value of the entire set is p= 1.96 * 10-21, which corresponds to finding such consistent results (or more consistent results) in one out of 508 trillion (508,000,000,000,000,000,000). Such a level of linearity is extremely unlikely to have arisen from standard sampling. We also found overly consistent results across independent replications in two of the papers. As a control group, we analyze the linearity of results in 10 papers by other authors in the same area. These papers differ strongly from those by Dr. Förster in terms of linearity of effects and the effect sizes. We also note that none of the 2284 participants showed any missing data, dropped out during data collection, or expressed awareness of the deceit used in the experiment, which is atypical for psychological experiments. Combined these results cast serious doubt on the nature of the results reported by Dr. Förster and warrant an investigation of the source and nature of the data he presented in these and other papers.

Read the whole report here.

Please see an update on this post, including the final LOWI report.

Like Retraction Watch? Consider supporting our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, and sign up on our homepage for an email every time there’s a new post.

About these ads

Written by Adam Marcus

April 30, 2014 at 4:10 pm

239 Responses

Subscribe to comments with RSS.

  1. A very interesting simulation by Leif Nelson and Uri Simonsohn:

    http://datacolada.org/2014/05/08/21-fake-data-colada/

    Hellson

    May 8, 2014 at 3:04 pm

  2. To contradict some statements that have been made here, the investigators did not have the data. Their report specifically mentions that they did obtain additional information from the authors for the 2012 paper (n and sd), everything else comes from the paper. If the data files were made available that would certainly help explain what was done, and probably why they aren’t being released. It would be extremely difficult to create data that matched the given results but did not show signs of manipulation. It was also mentioned somewhere that the original paper records were thrown out. So no consent forms, nothing. It does happen, but is rather convenient in this case, and poor research conduct. In this type of study all information from participants is often not transferred to computer files.

    Ken

    May 10, 2014 at 7:26 am

  3. Another interesting issue:

    In 2010 Forster published this theoretical article:

    http://www.tandfonline.com/doi/pdf/10.1080/1047840X.2010.487849

    Here (http://www.socolab.de/main.php?id=66) Forster writes about the experiments described in the article for which LOWI concluded that data manipulation must have occurred that:

    “The series of experiments were run 1999 – 2008 in Germany, most of them Bremen, at Jacobs University; the specific dates of single experiments I do not know anymore”.

    So, when the theoretical article was published in 2010, all those 24 experiments described in the 2011 and 2012 articles had already been conducted?

    SPMETH

    May 12, 2014 at 6:30 am

  4. Yes, some cases seem clear-cut. Should the University of Sydney’s long-running “initial inquiry” advise the university board to call for retraction of the extraordinarily faulty “Australian Paradox” paper? http://www.australianparadox.com/pdf/RRsubmission2inquiry.pdf

  5. GJ: As I said, I find the results convincing. My post is a comment on the process that is going on now, primarily that people are making very bold claims here based on extremely limited information, and the way this so-called report has been publicized here. That is the danger of posting something like this on the internet. I just question the wisdom of doing so…I am just a worried citizen and I think the public scolding and the entire accusation process should be dealt with in a more careful manner. We are talking about a person here. Regardless of the evidence, the fact that this accusation is published here in such a way is likely to create a sense of fear among all researchers. That doesn’t help the overall situation in the field, in my humble opinion. It wasn’t completely clear to me which report referred to this case, thanks for posting the link. But this only adds to my point, which is that it is difficult from all of these small pieces of information to get an overview of what is going on. The result is that people will focus on and comment on the one thing they understand from the documents.

    When I was talking about fabrication I was indeed referring to Stapelesc practices, which seemed to be what was being suggested. Of course, running a series of control conditions until you find what you want is also fabrication in the second sense in which you are referring to it. If you believe these data to be too goo to be true, then simply removing some participants in a selective way would not have been able to create this pattern, would it? That’s a serious question: in order to be able to selectively remove participants to get precisely the values in the conditions that you want and precisely the number of participants for a perfectly balanced design, how big would those samples really have to have been? This seems improbable so IF you say it is too good to be true, doesn’t that imply actually accusing someone of fabricating data in the first sense, the Stapel sense?

    Richard: If you supervise 10-20 thesis students, most of whom conduct several experiments, you are already seeing data of at least 20 experiments in a year. Add to this that if you are successful, you have one or more RAs running experiments for you as well and that you might even collect some data yourself…. Your comment is implying that conducting a lot of experiments means one is capitalizing on chance or trying to do so. I didn’t say people run the same experiment 40 times.

    The point that I was making is that there are many questions one could ask about the background of the data and the studies, questions that we are now just skipping over a bit too easily for my taste. The broader implications of this skipping over can be desastrous for researchers who go about their work in a valid scientific way. If we suddenly, for instance, all believe that large effect sizes are indicative of fraud, then who all becomes a suspect?

    The problem is in the process that follows when one is accused (right or wrong); as I said before, I agreed that there seems no way around the linearity point. Just because that is so, doesn’t mean everything else suddenly becomes a valid argument.

    Joop

    May 4, 2014 at 5:06 pm

  6. In reply to Joop, who writes “Richard: If you supervise 10-20 thesis students, most of whom conduct several experiments, you are already seeing data of at least 20 experiments in a year”

    Again, it is rather unethical not acknowledge the contributions of such students in a paper (especially for someone of this “caliber” and who has been member of a national level ethics board). So, there are 2 possibilities: 1) There were other contributors to this work that have not been acknowledged, or 2) he did it all himself, which raises the question if there was a clash between his capacity to carry out such an experiment and guarantee the quality of his work (and that is of course the issue why we discuss this in the first place).

    There is a lot of talk about that we need to be cautious about the person and that we might not have all the facts. I agree with that, but at the same time, it is rare that discussions like this one here actually occur. It is happening because so many things do not add up while this researcher was going to get a massive grant other researchers did not get.

    Johny

    May 4, 2014 at 5:52 pm

  7. I am a former student of Jens Förster and I also worked as a student assistant in his lab quite some time ago. I am not currently working with him and haven’t worked for him for quite a while, so I cannot say anything about the papers in question since the data was collected after my time as a student assistant. I can say, however, that the papers on data that I have helped to collect DO list me and the other student assistants involved. In fact, in my experience, Jens always did a rather good job publicly recognizing the contribution of anyone working with him. He and his PhD students at the time also did their best to support the student assistants in the lab in their studies which is more than what I can say about some other senior researchers I have met.

    sannanina

    May 5, 2014 at 7:32 am

  8. “simply removing some participants in a selective way would not have been able to create this pattern, would it?”
    Selective removal of data points, that are away from the desired position, is a surprisingly powerful way to get exactly what is seen here: a highly significant study endpoint but with unnaturally shaped distributions.

    http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0065323

    Researchers taking this approach will be safe if a later university investigation demands to verify the source data, since the data will always exist. The mischief is all in the deletion, which source data verification does not detect.

    Prof Darrel Francis

    May 6, 2014 at 3:24 am

  9. Johny, I completely agree that if students contributed to the data collection during their thesis, this should be acknowledged. This is something that is up for debate, but I think that when it comes to these acknowledgments (and reporting in general) it is best to err on the side of caution (and to be charitable to any contributor). Just saying it is not only possible, but likely, to have conducted a large number of experiments. Note that I was responding to the more general situation that Richard was talking about because he says it is not ethical per se to conduct 20 or more experiments in a year. Ethics boards have traditionally (pre-Stapel) been mostly concerned with the ethical treatment of participants, not so much with the ethics of the data reporting process (this has changed now, which is obviously a good thing). I don’t know that his haveing been involved in such a committee in the past is an argument for anything, although as I said I agree there is definite sloppiness in the repoting of participant information.

    To your possibility 2, I don’t think it is likely that he did the research himself. Full professors do not generally sit in the laboratory to collect data. I tend to agree that there is something going on here, but there is a lot of speculation going on, of which I am not convinced that it is necessarily a good thing.

    Joop

    May 5, 2014 at 1:36 am

  10. It is not uncommon for papers published in the journal “Social Psychological and Personality Science” that there is no section with Acknowledgments and that there is no information who has assisted the author(s) to conduct the experiments (etc.). On the other hand, those papers often discuss the findings of 80-150 participants (often students of psychology), so it is well possible that the author(s) have conducted all experiments by themself.

    Bianca Beersma and Gerben van Kleef of the Department of Work and Organizational Psychology, University of Amsterdam, Netherlands also have published a paper in this journal (“How the grapevine keeps you in line: gossip increases contributions to the group”, http://spp.sagepub.com/content/2/6/642.abstract ).

    They state:

    * “Participants were 147 undergraduate students at a large university in the Netherlands (47 males and 100 females, mean age 22 years), who participated in the study for course credits or 7 Euros. The experiment had a two (group members’ tendency to gossip: high vs. low) X (identifiability: absent vs. present) full-factorial design. Participants were randomly assigned to conditions using a double-blind procedure.”

    * “Acknowledgment The authors thank Gunnhildur Sveinsdottir for her help with collecting the data.”

    There is indeed alot of speculation going on, but this speculation is strongly biased. The complainer has released a report which lists all shortcomings of the three different papers. Anyone can read this report and anyone can start to debate about this report. Quite a few people over here have already made comments on this report.

    On the other hand, Jens Förster, his co-author Markus Denzler, the reviewers of the three papers and the editors of the journals have not reacted, and they also don’t have released a report (or whatever) in which the problems with the three papers are rebutted.

    There are quite a few threads on Retraction Watch where both parties openly debate with each other about such topics on Retraction Watch. Examples are http://retractionwatch.com/2014/04/14/anonymous-blog-comment-suggests-lack-of-confidentiality-in-peer-review-and-plays-role-in-a-new-paper/ and http://retractionwatch.com/2014/03/21/i-am-deeply-saddened-and-disturbed-co-author-of-retracted-nature-paper-reveals-how-problems-came-to-light/

    Please also read carefully the preliminary decision of the Board of UU (Utrecht University) in the case against Pankaj Dhonukshe, listed in the attatchment of the contribution of Pankaj Dhonukshe of 25 March 2014.
    The Board of UU decided that Pankaj had done alot of things not very well (‘sloppy science’) and Pankaj himself admitted that he had made several mistakes (= honest errors, etc.). However, the Board decided that Pankaj had not violated academic integrity, mainly because Pankaj had immediately send an e-mail to the editor of Nature to tell Nature that there were concerns about parts of a paper he had published in Nature. This behaviour of Pankaj is in line with I.10 of the Code of Conduct (I. Scrupulousness. Scientific activities are performed scrupulously), see page 5 of http://www.uu.nl/SiteCollectionDocuments/The%20Netherlands%20Code%20of%20Conduct%20for%20Scientific%20Practice%202012.pdf ).

    Klaas van Dijk

    May 5, 2014 at 4:53 am

  11. Jens Förster wrote: “The only thing that can be held against me is the dumping of questionnaires (that by the way were older than 5 years and were all coded in the existing data files) because I moved to a much smaller office. (…). This was suggested by a colleague who knew the Dutch standards with respect to archiving. I have to mention that all this happened before we learned that Diederik Stapel had invented many of his data sets.”

    Jens Förster moved in 2007 from Bremen to Amsterdam ( http://www.dgps.de/index.php?id=199 ). So the dumping of the questionnaires took place in Amsterdam in the period 2007-2011 and the questionnaires were collected more than five years ago.

    “Sannanina” wrote in this thread on RW: “I am a former student of Jens Förster and I also worked as a student assistant in his lab quite some time ago. I am not currently working with him and haven’t worked for him for quite a while, so I cannot say anything about the papers in question since the data was collected after my time as a student assistant.”

    ———————————————————————————

    1. Markus Denzler, Jens Förster & Nira Liberman, 2009, How goal-fulfillment decreases aggression, Journal of Experimental Social Psychology 45: 90–100, received 10 January 2007; revised 26 August 2008, available online 6 September 2008:

    “Experiment 1. Ninty-one participants (51 women, 40 men) from University of Würzburg participated in a series of studies and received €12 (at the time approximately US$14) as compensation. There were no gender differences in any of the results reported below. All participants first filled out the same questionnaires unrelated to the present experiment for about 15 min. The present study was introduced as a study on perspective-taking, which for economic reasons was added to an allegedly unrelated study on reaction times and verbal comprehension.(…). After the experiment, participants were thanked, fully debriefed, paid and dismissed. (…). We examined speed of lexical decision after excluding incorrect responses (1.2% of the responses). Here and also in the following studies incorrect responses did not differ across conditions, and hence are not further addressed”

    “Experiment 2. Fifty-two participants (25 women, 27 men) from Bremen University participated in a battery study and received €12 (at the time approximately US$14) as compensation. One participant had to be excluded because he was not a native German speaker. There were no gender differences in any of the results reported below. (…). We excluded incorrect responses (2.3% of the responses)”

    “Experiment 3. Eighty-five (44 women, 41 men) participants from Bremen University were recruited for a battery study and received €12 (at the time approximately US$14) as compensation. Because not all reaction times were recorded for two participants due to computer problems, we excluded them from the analyses. There were no gender differences in any of the results reported below. (….). We excluded incorrect responses (2.8% of the responses)”

    2. Ronald Friedman, Jens Förster & Markus Denzler, 2007, Interactive Effects of Mood and Task Framing on Creative Generation, Creativity Research Journal 19: 141–162:

    “Experiment 1. Sixty-five undergraduates at the University of Missouri–Columbia were recruited for a study described as involving a number of separate tasks, including one in which they would be asked to write about themselves. Participants completed the study in groups of up to 5 during sessions that lasted approximately 30 min, and received course credit for participation. Upon arrival, participants were seated at computer stations, visually isolated from one another by means of sound-attenuating paravents. The entire procedure was administered by computer using MediaLab experimental software.

    “Experiment 2. One hundred and five undergraduates at the University of Missouri–Columbia were recruited for a study described as involving a number of separate tasks, including one in which they would be asked to write about themselves. Participants completed the study in groups of up to 5 during sessions that lasted approximately 30 min, and received course credit for participation. The procedure was virtually identical to that of Experiment 1.

    “Experiment 3. One hundred and thirty-five university undergraduates and high school students from the Bremen area, majoring in disciplines other than Psychology were recruited for a study described as consisting of a number of different projects, including one in which they would be evaluating TV shows. The experiment was conducted at the International University Bremen (IUB). Participants completed the study individually in sessions that lasted approximately 2 hr and received 14 Euro for participation. Ten participants indicated a lack of familiarity with the TV shows used in the procedure (see below) and thus were excluded from the analyses.”

    3. Jens Förster, Ronald Friedman, Amina Özelsel & Markus Denzler, 2006, Enactment of approach and avoidance behavior infuences the scope of perceptual and conceptual attention, Journal of Experimental Social Psychology 42: 133–146, received 8 December 2003; revised 28 October 2004, available online 29 April 2005:

    “Experiment 1. Sixty undergraduate students at the University of Bremen majoring in disciplines other than Psychology were recruited for an experimental session consisting of “a number of diverse psychological tests” to take place at the International University Bremen. Participants were run in groups during sessions that lasted approximately 2 h and were paid D16 for their participation. Data from four participants were not recorded due to computer error and was thus excluded from the analyses. (…). After the entire experimental session was completed, participants were probed for suspicions, debriefed, paid, and released. No suspicions regarding the connection between the maze manipulation and the global/local reaction time measure were voiced.”

    “Experiment 2. “Fifty-four undergraduate students at the University of Würzburg majoring in disciplines other than Psychology were recruited for an experimental session consisting of “a number of diverse psychological tests.” Participants were run in groups during sessions that lasted approximately 1 h and were paid DM 12 for their participation.”

    “Experiment 3. Thirty undergraduate students at the University of Würzburg majoring in disciplines other than Psychology were recruited for an experimental session consisting of “a number of diverse psychological tests.” Participants were run in groups during sessions that lasted approximately 90min and were paid DM 18 for their participation. Two participants were excluded from the analysis because a majority of their lexical decisions were incorrect. (…). After the entire experimental session was completed, participants were probed for suspicions, debriefed, paid, and released. No suspicions regarding the connection between the maze manipulations and the subsequent tasks were voiced.”

    ——————————————————

    The report of the complainer (https://retractionwatch.files.wordpress.com/2014/04/report_foerster.pdf) states:

    “Förster & Denzler (2012): the 12 randomized experiments in this paper involve a total of 690 undergraduates of which 373 were female. Participants received either 7 Euros or course credit for their one-hour participation. (..). Page 110 of the paper states that “At the end of the entire session, participants were debriefed; none of them saw any relation between the two phases.” It is uncommon to find (psychology) undergraduates with no suspicions concerning the goal of the studies in sample of 690, because these undergraduates are often trained in psychological research methods. It is also uncommon to have no dropout of participants or missing data in such a large sample.”

    “Förster (2011): this paper reports results of 18 randomized experiments involving a total of 823 undergraduates. Of these 509 (61.8%) were female. All 823 participating undergraduates were probed for suspicion concerning the
    relation between the tasks in the experiment. None of them saw any relation between the tasks. This is highly unlikely in such a large sample containing undergraduates who are typically trained in psychological research methods and who are often quite experienced as research participants. The lack of missing data and dropout is also not characteristic of psychological experiments of this type in such a large sample.”

    “Förster (2009): This paper reports a total of 12 experiments, involving 736 undergraduates and 42 business managers. (…). The paper does not report any dropout or missing data among any of the 778 participants. This is atypical of psychological experiments. All participants were probed for suspicion concerning the goal of the studies. None of the 736 undergraduates and 42 business managers raised the possibility that the different study phases were related. This is quite unexpected in such a large sample containing undergraduates who are often trained in psychological research methods and are experienced as participants.”

    “Although the origin of the undergraduates is not explicated, it is likely that they were (predominantly) from the University of Amsterdam, at least for the 2011 and 2012 papers. All participants were debriefed after each experiment, so it is implausible that undergraduates returned for later experiments by Dr. Förster without any of them expressing awareness of the research hypothesis of (or the deceit used in) the later experiment. So the number of undergraduates participating in the 40 experiments cannot be attributed to the reuse of undergraduates from the same pool of participants. This raises further questions about the origin of the data.”

    —————————————————-
    Förster (2009) was published in February 2009

    http://psycnet.apa.org/index.cfm?fa=buy.optionToBuy&id=2009-01083-008

    ————————————————–

    Excuse me very much, but I fail to understand when and where all these 42 experiments with in total 2242 undergraduates were carried out. Is there anyone over here who can tell me more about this?

    So all these experiments collect and store information about the sexe of the participants, but how about collecting and storing information on date (eg, 6 May 2014) and site (eg. Amsterdam, region of Bremen) for undergraduates participating at such kind of experiments?

    Can anyone over here tell me a bit more about the general design of such kind of experiments carried out at UvA (in particular about collecting information about site and date)?

    Klaas van Dijk

    May 6, 2014 at 4:28 am

  12. The date should be on the consent form each subject signs, and on the master sheet of subject signatures. The site should be obvious from the consent forms as well. But I guess the problem is that he does not have those consent forms anymore…

    Helen Arbib

    May 6, 2014 at 5:39 pm

  13. One problem always is determining the difference between bad and dishonest research. The Australian Paradox definitely qualifies for the former, only if they knew it was bad does it qualify for the otter, and it is rather difficult to prove what people believed.

    Ken

    May 10, 2014 at 7:59 pm

  14. Ken,

    It’s not the extraordinarily faulty original Australian Paradox paper that made the Australian Paradox episode a clear case of research misconduct, in my opinion, it’s the authors’ determined post-publication exaggeration – now in two formal journals, as well as on Australian national radio – of the evidence for their “finding” of “an inverse relationship” between sugar consumption and obesity: pp 6-8 in http://www.australianparadox.com/pdf/RRsubmission2inquiry.pdf


We welcome comments. Please read our comments policy at http://retractionwatch.wordpress.com/the-retraction-watch-faq/ and leave your comment below.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 35,951 other followers

%d bloggers like this: