When zoologists at the University of Oxford published findings in Science last year suggesting ducklings can learn to identify shapes and colors without training (unlike other animals), the news media was entranced.
However, critics of the study have published a pair of papers questioning the findings, saying the data likely stem from chance alone. Still, the critics told us they don’t believe the findings should be retracted.
If a duckling is shown an image, can it pick out another from a set that has the same shape or color? Antone Martinho III and Alex Kacelnik say yes. In one experiment, 32 out of 47 ducklings preferred pairs of shapes they were originally shown. In the second experiment, 45 out of 66 ducklings preferred the original color. The findings caught the attention of many media outlets, including the New York Times, The Atlantic, and BuzzFeed.
Martinho told us:
We estimated statistically the probability with which our results could be expected by chance. That probability is extremely low, well beyond what is required normally in experimental science…One of our critics has even been quoted as saying to the press that some ducklings had random preferences. True, that’s the point of statistics! A minority even showed the opposite preference, and this is still fine, that’s why we estimated probabilities.
However, two separate research teams reanalyzed the data and came up with different conclusions.
If you had free choice and you [took] five pieces of black chocolate and six pieces of white chocolate, [the paper] would argue you have a distinct preference for the white chocolate. Which is not true because with the next piece you [might] choose the black chocolate again.
Langbein and Puppe used another statistical test, the binomial test, and found that the conclusions only held up for shapes, not colors, which might make evolutionary sense. Langbein told us:
As ducklings hatch at any time of day and night, one can conclude that imprinting not only occurs during daytime but also when brightness is low and color is not a salient stimulus for learning about the mother and siblings.
Our critique does not warrant a retraction of the data, but a new interpretation.
Another researcher argued that the entire study was based on a defunct statistical method. Jean-Michel Hupé at the Université Toulouse argued that p-values less than 0.05 do not show significance, only a surprising result.
The usual practice for about 60 years is to consider that if your observation is surprising, then the null hypothesis is probably wrong. But how “probably wrong”, you don’t know, and you don’t know it whatever the p-value….p-values are pretty useless to make any inference on models or parameters of models, and this has been very well known for decades (even though I, like all my colleagues, were taught to use p-values). Since last year, this is also official with the American Statistical Association publishing a statement about it. The usage of p-value is no longer controversial. It’s just wrong.
Instead, Hupé argued that the statistics of confidence intervals should be used, a method that finds a probability that data fall within a certain range. Even confidence intervals have issues, he said in an email, but:
practically, you know that you have **about** 95% chance that the true value is within your 95% CI…The CI allows you to interpret your data, the p-value does not.
Indeed, Hupé’s reanalysis showed that little conclusion could be made with the duckling data, he argued:
What if Martinho and Kacelnik had presented their data the way I suggest? Their study would have looked like a promising pilot study, describing a clever paradigm. That’s definitely worth publishing. I doubt however that Science editors would have considered it to be worth publishing in their high-impact journal. But that’s their problem, not ours. They are responsible for promoting good stories rather than humble facts. My hope is that readers will understand better statistics after reading my comment.
Still, Hupé agreed the paper should not be retracted:
Retraction is certainly not warranted (unless you decide to retract most papers, which based their conclusions on p-values), and I hope that my technical comment is enough as a cautionary notice.
A representative of Science agreed, telling us the journal
is not considering an Editorial Expression of Concern, nor a Retraction, for this paper.
The published exchanges allow for transparent debate about the data and conclusions, the spokesperson said, and the Science editorial team believed any technical issues were constructively addressed by the researchers.
Meanwhile, Martinho and Kacelnik — who also published a response to the criticisms — are moving forward with replicating their initial observations.
Martinho told us:
Both our group and other experimentalists have conducted more experiments testing the same and related ideas. Our practice is to report results through the appropriate, peer-reviewed media, when they are fully analysed. I can only say that results so far are reassuringly strong.
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.