When an ecologist realized he’d made a fatal error in a 2009 paper, he did the right thing: He immediately contacted the journal (Evolutionary Ecology Research) to ask for a retraction. But he didn’t stop there: He wrote a detailed blog post outlining how he learned — in October 2016, after a colleague couldn’t recreate his data — he had misused a statistical tool (using R programing), which ended up negating his findings entirely. We spoke to Daniel Bolnick at the University of Texas at Austin (and an early career scientist at the Howard Hughes Medical Institute) about what went wrong with his paper “Diet similarity declines with morphological distance between conspecific individuals,” and why he chose to be so forthright about it.
Retraction Watch: You raise a good point in your explanation of what went wrong with the statistical analysis: Eyeballing the data, they didn’t look significant. But when you plugged in the numbers (it turns out, incorrectly), they were significant – albeit weakly. So you reported the result. Did this teach you the importance of trusting your gut, and the so-called “eye-test” when looking at data?
Daniel Bolnick: Only partly. The fact is, there really can be cases where trends may be weak but significant. Or, real trends may exist after controlling for many other variables, and therefore a plot of y ~ f(x) might look like a shotgun scatter, when in fact y does depend on x after variable(s) z are accounted for. In other words, 2-dimensional plots might fail to capture actual trends. So I still do believe that our eye-test, while useful, is not a sufficient basis for judgement. That’s why we do statistics. We just need to do the statistics correctly.
RW: So your supposed result – that animals that are phenotypically more similar have more similar diets – turns out to be not true. Does that surprise you, given that the assumption was that they were?
DB: Absolutely, it surprises me. I do think that I was predisposed to accept the “significant” trend despite my “eye-test” being negative, precisely because I really truly expected this result to hold. In fact, I still believe it holds, and I just need to do a better job of measuring diet or morphology. I have other data, published elsewhere, that still leads me to strongly suspect the phenomenon holds true, even if it didn’t show up with this particular method in this particular population/year.
RW: Is this your first retraction? If so, how did it feel?
DB: Yes. It felt horrid. A sinking feeling in my gut, and I had a hard time sleeping that night. Once I found the mistake, I wrote to the journal immediately. Their official retraction notice is coming out in the next issue.
RW: We noticed the journal removed the paper entirely – which goes against the retraction guidelines issued by the Committee on Publication Ethics. Is that why you decided to write the blog?
DB: No – and I have a copy of the paper, for anyone who wants to read it.
RW: You write about your mistake and the ultimate retraction – something you didn’t enjoy doing, as you note: “It certainly hurt my pride to send that retraction in, as it stings to write this essay, which I consider a form of penance.” Most researchers don’t write public blog entries when they retract papers – why did you choose to do so?
DB: It was my penance. 500 years ago I might have walked through the town square whipping my back. This seems a bit more civilized, and preferable. Actually, what interested me most in writing the blog post was the notion of errors in statistical code. The R programming language has taken the biology world by storm. It’s what cool kids use. I’ve always considered myself a bit of an “R-vangelist”. In that regard, I’ve argued a lot (in a very friendly way) with Andrew Hendry who runs the blog in question, who hasn’t taken to R. This experience gives me pause, a bit. But really it is a mixed lesson. I only figured this out because I saved my R code and could retrace every step of every analytical decision, to find a mistake. That’s a good thing about R. But it happened because I, like many others, wasn’t an expert programmer in general, or in R in particular at the time. So the reliance on R predisposes us to these kinds of mistakes. The lesson here is that R is a two-edged sword, we have to be careful with it. That’s what I wanted people to learn, to avoid future mistakes.
RW: You note that other researchers make similar statistical errors, which should ideally be checked during review. Yet you admit that requires time and expertise on the part of reviewers, which we don’t always have. So what’s the solution, in your opinion?
DB: I’m taking over as Editor-In-Chief of The American Naturalist in January 2018, the oldest scientific journal in the US. So your question touches on something I’m thinking a lot about from a practical standpoint. Here are the barriers, as I see it:
- Not every researcher uses statistical tools that leave a complete record of every step, in order. Given the potential problems with coding errors, we shouldn’t require people to do so. That means this probably can’t be an obligatory part of review.
- Any journal that stuck its neck out and required well-annotated reproducible code + data for the review process would just see its submissions plummet. This needs to be coordinated among many top journals.
- Reviewers would either say “no” to review requests more often, or do a cursory job more often, if we required that they review R code. And many great reviewers don’t know how to review code properly either.
- Make this optional at first. To create an incentive, we put some sort of seal-of-approval on papers that went through code-review. As a reader, I’ll trust a paper more, and be more likely to cite it, if it has that seal. Authors will want it. Readers will value it.
- Find a special category of reviewer / associate editor who can check code. May be separate from the regular review process and may not require the subject matter expertise. The easiest way to do this is to hire someone, and charge authors a small fee to have their paper checked to get the seal-of-approval.
- A halfway version is to require code be provided with the data during review and upon publication, both. No formal review of the code is required, but reviewers MIGHT opt to do so. That creates just enough fear in the authors to create an incentive to proof-read and well-annotate their own code. They may find errors in doing so. Basically the proof-reading goes back to the authors, but we entice them to self-proof-read a bit more carefully than they otherwise might do.
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.