As the bumper sticker says, “Regime change starts at home.” Seems to be the case with scientists these days.
This month we have seen commendable instances of researchers retracting papers after identifying flaws in their own data — an outbreak of integrity that has us here at Retraction Watch applauding. (We’ve even created a new category, “doing the right thing,” at the suggestion of a reader.)
Today’s feel-good story comes from the lab of Karl Svoboda, of the Howard Hughes Medical Institute’s Janelia Farm Research Campus, in Ashburn, Va. Back in June, Svoboda and his colleagues published “Whisker Dynamics Underlying Tactile Exploration,” in the Journal of Neuroscience. Here’s what the abstract had to say about the study:
Rodents explore the world by palpating objects with their whiskers. Whiskers interact with objects, causing stresses in whisker follicles and spikes in sensory neurons, which are interpreted by the brain to produce tactile perception. The mechanics of the whisker thus couple self-movement and the structure of the world to sensation. Whiskers are elastic thin rods; hence, they tend to vibrate. Whisker vibrations could be a key ingredient of rodent somatosensation. However, the specific conditions under which vibrations contribute appreciably to the stresses in the follicle remain unclear. We present an analytical solution for the deformation of individual whiskers in response to a time-varying force. We tracked the deformation of mouse whiskers during a pole localization task to extract the whisker Young’s modulus and damping coefficient. We further extracted the time course and amplitude of steady-state forces during whisker–object contact. We use our model to calculate the relative contribution of steady-state and vibrational forces to stresses in the follicle in a variety of active sensation tasks and during the passive whisker stimuli typically used for sensory physiology. Vibrational stresses are relatively more prominent compared with steady-state forces for short contacts and for contacts close to the whisker tip. Vibrational stresses are large for texture discrimination, and under some conditions, object localization tasks. Vibrational stresses are negligible for typical ramp-and-hold stimuli. Our calculation provides a general framework, applicable to most experimental situations.
But, well, shortly thereafter, they said, the scientists caught whiff of a problem with their study. According to the retraction notice:
At the request of the authors, The Journal of Neuroscience is retracting “Whisker Dynamics Underlying Tactile Exploration” by S. Andrew Hires, Alexander L. Efros, and Karel Svoboda, which appeared on pages 9576–9591 of the June 5, 2013 issue. The authors report, “After publication we discovered that higher-order eigenmodes were incorrectly summed when calculating the time-dependence of whisker shape during touch with a rigid object. Correction of this error revealed that our boundary conditions were inappropriate for the whisker-object interactions treated in our paper. Modification of these boundary conditions will alter the results presented in Figures 6–11. We therefore wish to withdraw the article. A corrected treatment will be published in the future. We apologize for any confusion caused by this error.”
Svoboda gave us a bit more information about what happened:
We were applying our model to different experimental conditions examined by another group (George Debrégeas & Dan Shulz) and found a mismatch between their and our results. We then found an error in one of our computer routines used for calculating the total displacement of a whisker through time during contact with an object. The error caused the time-dependence of each eigenmode to behave as if it were the first eigenmode when calculating vibrational whisker displacement…
After fixing the code, the calculation showed that vibrations would force the whisker into the pole during force application, a physical impossibility for a rigid object. Thus our initial boundary conditions were insufficiently constrained. This statement in the discussion was no longer correct.
We are working through all the consequences of the new approach. There will be differences in how vibrations propagate along the whisker and how vibrational modes interacts. The results in Figures 6-11 will be quantitatively different.
As we’ve said before, this sort of retraction deserves kudos — and it underscores that retractions don’t have to be a blot on the record when there’s no reason to smell a rat.
What a bummer – the common nightmare for people who write software to analyse data, identify a ‘fatal’ flaw out after you publish. Could happen to anyone and hats off to the authors who went back to figure out the problem in their software.
I feel that this type of problems exist in imaging (microscopy) world as well
Yes – at several levels. One is the black box of the innocent user applying commercial software without going into the details. No real excuse for that, generally due to bad mentoring or poor management/training in imaging facility. Another is when you write your own code. Data analysis can take a month, then you go back, re-do etc. etc.
Retractions and corrections are always going to be a blot on the record — not an ethical blot if they result from honest error, but still a confidence-reducer. This sort of thing is a bit like a child who breaks an expensive vase and then immediately tells his mother about it — you can applaud the honesty, but you still have to make it clear that the act itself was undesirable. The last thing we want is a culture where people feel free to publish carelessly and then retract or correct anything that turns out to be wrong.
The second half of your post contradicts the first half.
Do you really think that anyone wanting to hire Andrew (no pun intended) after his postdoc is going to think twice about it because of the retraction? This is one honest dude and everyone will commend him for this. Kudos Andrew.
It’s Karel Svoboda, not Karl.
Fixed. Thanks!
It’s been a great comfort to me throughout my career as a maker of scientific software that my boss, a brilliant and famous scientist, once released code that carried out a phylogenetic analysis based on *minimum* likelihood instead of maximum. (It turns out that the wrongest possible tree isn’t particularly exciting, alas. Just very long.) I haven’t been quite that wrong yet.
I have released software with bugs. I despair of making software with no bugs at all. All we as individuals can do, with things of this complexity, is make it as well as possible, listen when people find problems, and be honest.
We as a community could additionally insist on good testing as part of any grant-funded work involving software design; insist on open source, so that others can help identify the bugs; and avoid rewarding people who promise to write good software in impossibly short amounts of time. Unfortunately in the current grant climate that last one is a real issue. Say you’ll do it in a year, when you know that won’t leave time for testing? Or say it’ll take two years, and don’t get funded, or get funded with a big cut so you’ll only have one year anyway?
I recently asked for funding for three programmers on a project, and got a review that said, pretty much in so many words, “MaryKaye can program, she should do this herself.” Then you put out shoddy code, because if you don’t put something out the grant won’t be renewed.
I worked as a programmer eons ago before I went to grad school. Perhaps because my job dealt with money, we were trained to be extremely careful in testing, verifying and keeping track of everything. I was kind of astonished when I got to grad school to see how cavalier people’s attitudes were towards the use of software and I still notice how infrequently people really test software. There seems to be somewhat of an attitude that code can be quickly written and will do the job and doesn’t really need extensive testing. In my job, we were trained to test as many unusual situations as we could think of and make sure our program did what we expected in all of them.
I think new codes need reliable benchmarks in order to verify the accuracy. I work in computational chemistry and a lot of effort is focused on benchmarking new codes and methods to make sure they actually work against known problems. For things like microscopy and imaging, I haven’t seen the same level of benchmarking. It wouldn’t be hard to use a new method/program or a known problem (some agreed upon standard) just to verify that the new way works.