It’s been a busy few months for Brian Wansink, a prominent food researcher at Cornell University. A blog post he wrote in November prompted a huge backlash from readers who accused him of using problematic research methods to produce questionable data, and a group of researchers suggested four of his papers contained 150 inconsistencies. The scientist has since announced he’s asked a non-author to reanalyze the data — a researcher in his own lab. Meanwhile, criticisms continue to mount. We spoke with Wansink about the backlash, and how he hopes to answer his critics’ questions.
Retraction Watch: Why not engage someone outside your lab to revalidate the analysis of the four papers under question?
Brian Wansink: That’s a great question, and we thought a lot about that. In the end, we want to do this as quickly and accurately as possible – get the scripts written up, state the rationale (i.e., why we made particular choices in the original paper), and post it on a public website. Also, because this same researcher will also be deidentifying the data, it’s important to keep everything corralled together until all of this gets done.
But before we post the data and scripts, we also plan on getting some other statisticians to look at the papers and the scripts. These will most likely be stats profs who are at Cornell but not in my lab. We’ve already requested one addition to [the Institutional Review Board (IRB)], so that’s speeding ahead.
But even though someone in my lab is doing the analyses, like I said, we’re going to post the deidentified data, the analysis scripts (as in, how everyone is coded), tables, and log files. That way everyone knows exactly how it’s analyzed and they can rerun it on different stats programs, like SPSS or STATA or SAS, or whatever. It will be open to anyone. I’m also going to use this data for some stat analysis exercises into one of my courses. Yet another reason to get it up as fast as possible – before the course is over.
RW: A number of commenters have raised concerns about the general research approach you took in the four papers. As in – here’s a dataset, let’s try to get some papers out of it. How do you respond to accusations of p-hacking or HARKing?
BW: Well, we weren’t testing a registered hypothesis, so there’d be no way for us to try to massage the data to meet it. From what I understand, that’s one definition of p-hacking. Originally, we were testing a hypothesis – we thought the more expensive the pizza, the more you’d eat. And that was a null result.
But we set up this two-month study so that we could look at a whole bunch of totally unanswered empirical questions that we thought would be interesting for people who like to eat in restaurants. For example, if you’re eating a meal, what part influences how much like the meal? The first part, the middle part, or the last part? We had no prior hypothesis to think anything would predominate. We didn’t know anybody who had looked at this in a restaurant, so it was a totally empirical question. We asked people to rate the first, middle, and last piece of pizza – for those who ate 3 or more pieces – and asked them to rate and the quality of the entire meal. We plotted out the data to find out which piece was most linked to the rating of the overall meal, and saw ‘Oh, it looks like this happens.’ It was total empiricism. This is why we state the purpose of these papers is ‘to explore the answer to x.’ It’s not like testing Prospect Theory or a cognitive dissonance hypothesis. There’s no theoretical precedent, like the Journal of Pizza Quality Research. Not yet.
Field studies aren’t lab studies. They’re so darned involved that, in addition to the main thing you’re testing, we usually try to explore some empirical answers to other things that there aren’t yet answers to but which might happen in this real-world situation. Like do guys eat more with women or with other guys. If there’s a provocative answer to one of these, it can be tested in more detail in the lab, if merited. For instance, it could be the first essay in a dissertation, and then it could be followed up with a couple lab studies to confirm or disconfirm it. In this case, her dissertation went in a different direction once she got back to her own university. As a result, these ended up as single exploratory studies.
These sorts of studies are either first steps, or sometimes they’re real-world demonstrations of existing lab findings. They aren’t intended to be the first and last word about a social science issue. Social science isn’t definitive like chemistry. Like Jim Morrison said, “People are strange.” In a good way.
RW: Cornell has said they think it’s up to investigators to decide if they should release data or not, balancing the needs for confidentiality. Do you agree with that?
BW: I do agree with that. I think having researcher independence is a good idea, in the spirit of academic freedom. Having said that, this experience is changing how we’re doing things in the lab. Prior to this, we had no mechanisms or conventions in place to easily locate previous datasets that were collected 7 or 9 years ago, let alone give them to somebody given the high standard for IRB confidentiality agreements we’ve been using for 10 years.
Going forward, we’re now going to try and make major datasets we collect from now on – particularly field study data – publicly available about the time we publish a paper. Since some of this research in grocery stores or restaurants is often proprietary, in the past we have signed agreements saying we wouldn’t share sales data with anyone. But moving forward, I think we can loosen those up a bit. Just last week we modified the template agreement letters that subjects sign, so that we will protect their confidentiality but still ask them if we can share some aspects – like age, height, and weight – if they consent. In the past, we promised them we wouldn’t share anything about them. That’s pretty restrictive.
We’ve already changed this with our lab studies, and we’ll be doing something similar with the next field studies we run. I’m sure there’s going to be some learning with this, but I think it will also result in a useful and more general set of guidelines and protocols that other labs can use when they do field studies as part of their research.
RW: Critics have identified a number of numeric inconsistencies in your papers – just this week, an article in Medium pointed out problems in papers other than the four you’ve agreed to re-analyze. How do you respond to allegations that some of the numbers in your papers don’t add up?
BW: Studies in elementary school lunchrooms are different than running a reaction time study on a computer keyboard. Nobody starts a food fight or steals an apple during a reaction time study
In elementary school studies, there were times that the math of what food kids were given and what they ate vs. left behind didn’t add up to a perfectly round number because this study was done in elementary schools (and based on the well-cited quarter plate data collection method referenced in that paper). For anybody who can remember school lunches as an 8 year old – some amount of it ends up on the floor or in pockets. Also, to be conservative, when we state percentage increases or decreases, we usually try to calculate them from the average level of the range and not from the top or bottom.
With regards to the four papers that were originally questioned, we haven’t gotten the final report from the non-coauthor econometrician, the one in our lab, but many of the inconsistencies noted through granularity testing will be due to people skipping survey questions. For instance, you might report that there are 30 people in one condition, but for any given question anywhere from 26-30 might answer it, so it would get flagged by a granularity test. These people were eating lunch and they could skip any question they wanted – maybe they were eating with their best buddy or girlfriend and didn’t want to be distracted. So that explains many of the differences in the base figures some readers had noticed.
Also, we realized we asked people how much pizza they ate in two different ways – once, by asking them to provide an integer of how many pieces they ate, like 0, 1, 2, 3 and so on. Another time we asked them to put an “X” on a scale that just had a “0” and “12” at either end, with no integer mark in between. It was a just a silly way to ask the question. That’s the one inconsistency we’ve identified so far, but fortunately when the coauthors themselves have since looked at these, the conclusions are about the same as when integers are used.
Across all sorts of studies, we’ve had really high replication of our findings by other groups and other studies. This is particularly true with field studies. One reason some of these findings are cited so much is because other researchers find the same types of results. When other people start finding the same things, that moves social science ahead. That’s why replication studies are useful. Still, even replication studies need some toeholds to get started. It’s kind of strange to think of some of your studies as being toeholds, but at least they’ve been useful toeholds.
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.