Retraction Watch

Tracking retractions as a window into the scientific process

Backlash prompts prominent nutrition researcher to reanalyze multiple papers

with 18 comments

Brian Wansink

To Brian Wansink of Cornell University, a blog post he wrote in November, 2016, was a meant as a lesson in productivity: A graduate student who was willing to embrace every research opportunity submitted five papers within six months of arriving to his lab, while a postdoc who declined two chances to analyze a data set left after one year with a small fraction of the grad student’s publications.

But two months and nearly 50 comments on the post later, Wansink — known for so much high-profile nutrition research he’s been dubbed the “Sherlock Holmes of food” — has announced he’s now reanalyzing the data in the papers, and will correct any issues that arise. In the meantime, he had to decline requests to share his raw data, citing its proprietary nature.

As Wansink writes in the second addendum to the November blog post, “The Grad Student Who Never Said ‘No’:”

There’s been some good discussion about this post and some useful points of clarification and correction that will be made with these papers.  All of the editors were contacted when we learned of some of the inconsistencies, and a non-coauthor Stats Pro is redoing the analyses.  We’ll publish any changes as erratum (and we’ll have an analysis script).

Here is a description of the original research, from Wansink’s November post:

When [the graduate student] arrived, I gave her a data set of a self-funded, failed study which had null results (it was a one month study in an all-you-can-eat Italian restaurant buffet where we had charged some people ½ as much as others).  I said, “This cost us a lot of time and our own money to collect.  There’s got to be something here we can salvage because it’s a cool (rich & unique) data set.”  I had three ideas for potential Plan B, C, & D directions (since Plan A had failed).  I told her what the analyses should be and what the tables should look like.  I then asked her if she wanted to do them.

Ultimately, the student ended up with five papers. This concerned many readers, who posted comments such as:

This is a great piece that perfectly sums up the perverse incentives that create bad science. I’d eat my hat if any of those findings could be reproduced in preregistered replication studies. The quality of the literature takes another hit, but at least your lab got 5 papers out.

And:

What you describe Brian does sound like p-hacking and HARKing. The problem is that you probably would not have done all these sub-group analyses and deep data dives if you original hypothesis had p < .05…I have always been a big fan of your research and reading this blog post was like a major punch in the gut.

Wansink told us he was “hugely shocked” by the community’s reaction to his post, which he said was intended to illustrate

how you can take advantage of an opportunity, versus not taking advantage of an opportunity.

The “null result” (or “Plan A” for the dataset), Wansink explained, was his initial hypothesis that people eat less at a relatively cheap all-you-can-eat buffet than at one that costs more. Instead, he found people ate roughly the same amount, regardless of cost.

In response to the backlash to his blog, Wansink posted an addendum:

P-hacking shouldn’t be confused with deep data dives – with figuring out why our results don’t look as perfect as we want.

With field studies, hypotheses usually don’t “come out” on the first data run.  But instead of dropping the study, a person contributes more to science by figuring out when the hypo worked and when it didn’t.  This is Plan B.  Perhaps your hypo worked during lunches but not dinners, or with small groups but not large groups. You don’t change your hypothesis, but you figure out where it worked and where it didn’t.

Readers also linked to two blogs critiquing Wansink’s article — one by Andrew Gelman, who noted that some of the papers reported different “n” values, and one of the papers says that the graduate student collected the data when, by Wansink’s own account, that wasn’t true. Another blog post by Ana Todorovic at the University of Oxford notes:

It is a post that aims to accentuate hard work, efficiency, capitalizing on opportunities, a collaborative spirit, and dedication. It ends up highlighting questionable research practices, misrepresenting exploratory research as confirmatory, and a lack of understanding why null results are important.

Wansink told us the “n” for some papers is different because not everyone who participated met all the criteria necessary for each study — for instance, in a study looking at the effect of dining companions on eating behavior, the researchers had to exclude everyone who was eating alone.

Eventually, Wansink posted a second addendum, letting readers know he alerted editors to the “inconsistencies” and would issue any corrections as necessary.

He told us, however, that he doesn’t expect the re-analysis to overturn any of his findings:

I doubt the significance levels will be different at all.

In a PubPeer thread about the paper flagged by Gelman, Wansink confirmed he posted a comment suggesting an erratum is already underway:

When these contribution statements are asked, our general default is the graduate student or post-doc has usually collected and analyzed the data. When this was submitted, the person doing the formatting and submitting would have legitimately assumed that the grad student had again been the one to collect the data. In reality this had been done about 5 years earlier by a different grad student.

We have contacted the journal asking to rerun the analyses along with publishing an erratum. At that time we will also make this change.

That paper, “Low prices and high regret: how pricing influences regret at all-you-can-eat buffets,” was published in 2015 by BMC Nutrition.

Wansink noted that he’s going to wait to issue that erratum until he gets the results of the reanalysis, so he can make all of the changes at once. He noted that any errata he issues will also add references to the papers published from the same dataset. Since all of the papers were published close to each other, the authors forgot to add those references, Wansink said.

Here are the other papers:

Researchers have already published an analysis of the above papers in PeerJ, entitled “Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab,” which notes approximately 150 inconsistencies among the four papers. (See Gelman’s response here.) They write:

We contacted the authors of the four articles, but they have thus far not agreed to share their data.

Wansink acknowledged to us that he hasn’t shared the data, because it’s “tremendously proprietary.” He added in the second addendum to his blog:

Sharing data can be very useful – like with lab studies and large secondary data sets –  and in some instances being willing to do so (or a good reason why not) is a precondition to publishing in some journals.  When we collected the data for this study, our agreement to the small business and to the [institutional review board] would be that it would be labeled as proprietary and would not be shared because it contained data sensitive to the small town company (sales data and traffic data) and data sensitive to the small town customers (names, identifying characteristics, how many drinks they had, the names of the people they were sitting with, and so on).  This is data that cannot be depersonalized since sales, gender, and companions were central to some analyses.  (We had explained this when someone requested the data.) At the time we published these papers, none of the journals had the policy of mandatory data sharing, or we would have published these papers elsewhere.

Since 2005, Wansink has co-authored more than 200 papers, collectively cited more than 2800 times. If he issues any errata to these papers, they would be his first.

Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.

Written by Alison McCook

February 2nd, 2017 at 2:30 pm

Comments
  • Bob Roehr February 2, 2017 at 5:49 pm

    Deep dives are great, but they should be planned when the study is being constructed, not created after the fact in an attempt to “salvage” something from the experience.

  • MaryKaye February 2, 2017 at 5:52 pm

    I once heard, at a faculty hiring discussion, a colleague praise the candidate by noting that she had continued to work on her project while in labor with her child.

    If we hold such things up as our ideal of productivity, no wonder we get poor science. When “how many results per unit time?” is the metric, rather than “how good is the work?”, you can’t expect high quality work. (I have never been in labor myself, but I sincerely doubt I could do accurate statistics under those circumstances!)

    (On a personal level, I think this also contributes to the epidemic depression and other mental illness in the field: people are being held up to a ridiculous standard.)

  • L. J. Sloss, M.D. February 2, 2017 at 7:24 pm

    Reminds me of my own exposure to the research (volume) productivity metric and its perverse incentives and outcomes. I was in the audience at the pre-ACC Cardiology Rounds when the infamous John Darsee presented his 7 abstracts, a feat that drew gasps of admiration and a few non-ironic mutters of “incredible,” that turned out to be spot on. The only incentive worse than rewarding quantity is the elevation of grantsmanship to a position above all other values, monetizing the ulterior meaning of “productivity” for the benefit of the institution and its leaders.

  • Trent Gobsmack February 2, 2017 at 7:44 pm

    If doing research is simply paying half a restaurant half the price to see if people eat less or more over a month is what mount for science today I am not surprised the subsequent analysis were just as ridiculous. Sherlock Holms of food? Come on – this was bad science to begin with and the data clearly nonsensical and over a month in one restaurant hardly valuable for anything.

  • Anon February 2, 2017 at 7:51 pm

    “Since 2005, Wansink has co-authored more than 200 papers.”
    Where does this info come from? I believe many of the papers in his Google Scholar profile are simply conference abstracts that are printed in a supplement to Journal of Nutrition Education & Behavior.

  • Arnoud van Vliet February 2, 2017 at 7:57 pm

    So Wansink will be the prosecutorm, judge and jury? The article lacks a critical view on this very poor behaviour.

    Journals should become very strict: no data sharing, automatic retraction.

  • herr doktor bimler February 3, 2017 at 3:01 am

    the person doing the formatting and submitting would have legitimately assumed that the grad student had again been the one to collect the data.

    So there is someone in Wansink’s lab who formats papers, and submits them, and fills in the empty gaps, but doesn’t meet the criteria to be “author”?
    I guess I can’t really blame that person for not giving two f*cks, and simply guessing about what to write in the gap for “who did what”.

  • Klaas van Dijk February 3, 2017 at 5:14 am

    I would like to suggest to Dr. Wansink and to his co-authors of the four papers to read https://f1000research.com/articles/5-781/v1 (“Time for sharing data to become routine: the seven excuses for not doing so are all invalid”), and to reconsider their decision that others are not allowed to vet / validate / verify / scrutinize the raw research data of the four papers in question.

    I also would like to suggest to Dr. Wansink and to his co-authors to reflect on “Recommendation 7: Safeguarding and Storing of Primary Data” at page 74 at http://www.dfg.de/download/pdf/dfg_im_profil/reden_stellungnahmen/download/empfehlung_wiss_praxis_1310.pdf (“Proposals for Safeguarding Good Scientific Practice” of DFG, the main German funding agency).

    Recommendation 7 states for example:
    (a): “Primary data as the basis for publications shall be securely stored for ten years in a durable form in the institution of their origin.”
    (b): “Being able to refer to the original records is a necessary precaution for any group if only for reasons of working efficiency. It becomes even more important when published results are challenged by others.”
    (c): “Experience indicates that laboratories of high quality are able to comply comfortably
    with the practice of storing a duplicate of the complete data set on which a publication is based, together with the publication manuscript and the relevant correspondence.”

  • D Cameron February 3, 2017 at 6:42 am

    Wansink’s use of the phrases “our results don’t look as perfect as we want” and
    “[I haven’t] shared the data, because it’s ‘tremendously proprietary'” pretty much speaks for itself.

  • Anon2 February 3, 2017 at 9:21 am

    It would be interesting to know if “the postdoc who said no” is objectively a better scientist than “the grad student who said yes” (bearing in mind that we know the postdoc had to leave science and the grad student was predicted to do well). Admittedly, the postdoc probably said “no” purely because they were too busy rather than due to worries about the scientific method. But theoretically you could have this identical situation and the thorough scientist would have a failed career, and the sloppy scientist would rise rapidly.

  • Christine Zomorodian February 3, 2017 at 1:14 pm

    The first paragraph refers to a “post doc” that was unwilling to analyze data. What happened to that part of the story?? I see the second quote containing “[the graduate student];” is that the same person? This story was very hard to follow; I’d call it meandering and flabby. How about a tighter narrative that adheres to EITHER journalistic or technical writing style principles?

    • Sylvain Bernès February 14, 2017 at 9:10 am

      The Wansink’s post mentions that “the post-doc left after a year (and also left academia)”.

  • Mary Kuhner February 3, 2017 at 4:09 pm

    I’d like to note in passing that paper titles along the lines of “Men eat more when dining with women” are TERRIBLE PRACTICE unless the authors have actually shown that. If, instead, they have shown that “Men eating at Italian buffets in [name of town] eat more when dining with women” they should be honest about the fact.

    How likely is such a finding to be cross-cultural? Not likely enough that it can be assumed without a single shred of evidence, surely! And then there’s SES and age group to consider.

    This sort of thing unfortunately works to validate the widespread prejudice that social science is not science.

    • Anonymous February 6, 2017 at 4:15 am

      I could not agree more, terrible practice. There is a similar problem in biology as well. Some authors can easily claim they have found another treatment that will cure cancer (although I most often come across these statements in the discussion / conclusion, and not so much in the title).

    • gago February 6, 2017 at 5:20 am

      And they should also put “$8” in the title. Honestly, how many of us would go on a date to an $8 all-you-can-eat Italian buffet? And who cares how much one eats when you go into a $8 all-you-can-eat Italian buffet? Let’s not forget the saying “garbage in, garbage out”.

    • conor dolan February 6, 2017 at 3:12 pm

      Another consideration is that, in all likelihood, the effect is random: i.e., not the same in each and every male. This does not rule out that some men reliably eat less when dining with women. If the effect is random, but treated as fixed the statistical tests may be incorrect (actual type I error rate > alpha). I believe that many studies employ fixed effects analyses, in situations where mixed effects are more plausible.

  • evans winter February 3, 2017 at 10:02 pm

    I see an inconsistency in the professor’s off hand description of his “A” hypothesis , itself. He refers to differences in costs levels of restaurants (environments), but apparently collected both data sets (full price vs half price) of data in the same restaurant. Maybe he should reword the hypothesis—or is this why we have “hype”–Considering the number of variables, known and unknown, that would be introduced by applying the same methodology in more than one restaurant, this would appear to be an exercise in futility, in the first place—except as an instructional tool or for leading a candidate to an idea for a thesis, without giving it to her. —

  • Sylvain Bernès February 14, 2017 at 9:43 am

    MaryKaye
    On a personal level, I think this also contributes to the epidemic depression and other mental illness in the field: people are being held up to a ridiculous standard.

    I think that the perspective developed by Brian Wansink in his analysis is a bit more balanced. At some point, he mentions “I think the person [the post-doc] was also resentful of the Turkish woman.” It remains to be seen if angry and/or resentful former members of a lab should be treated as persons having potential mental issues.

  • Post a comment

    Threaded commenting powered by interconnect/it code.