Over the past few years, Nature has published editorials extolling the virtues of replication, concluding in one that “We welcome, and will be glad to help disseminate, results that explore the validity of key publications, including our own.” Mante Nieuwland, of the Max Planck Institute for Psycholinguistics, and colleagues were encouraged by that message, and submitted one such replication attempt to Nature Neuroscience. In a three-part guest post, Nieuwland will describe what happened when they did, and discusses whether reality lives up to the rhetoric. Here’s part one:
On April 10th 2018, eLife published the first large-scale direct replication study in the field of cognitive neuroscience, co-authored by 22 colleagues and myself. This publication detailed a replication effort that spanned 9 laboratories and attempted to replicate a high-impact 2005 publication in the prestigious journal Nature Neuroscience from DeLong, Urbach and Kutas (from hereon referred to as DUK05). People often ask why our replication study was not published in Nature Neuroscience, especially in light of its recent public commitments to replication research (here and here). It certainly wasn’t for our lack of trying.
In this post, I offer a behind-the-scenes account of what happened when we tried to replicate DUK05 and submitted our results to Nature Neuroscience, illustrating some of the obstacles faced by researchers who perform and try to publish replication research. These experiences are not unique by any means (for examples, see here and here). I am sharing our experiences openly in the hope that Nature and other publishers can become more transparent about their policies regarding replication research, and I hope that our experiences will serve and support anyone with an interest in replication research.
The text contains literal quotes from emails that I sent but summarized and/or paraphrased content of emails that I received, because I did not obtain permission to quote from those emails.
The original study
DUK05 argued that readers routinely predict the sound of an upcoming word. Their evidence in support of this conclusion came from comparing event-related potentials (ERPs – brain responses) to articles and to the subsequent nouns. The nouns were either expected or unexpected given the sentence context (for example, “It was a windy day so the boy went outside to fly a kite/an airplane...”, of which each word is successively presented in in isolation). The idea was that if people predict the noun, then they activated sounds based on that prediction, and this activation even extends to the particular phonological form the determiner article should take.
They found that the amplitude of the N400 component for a given word gradually increased when that word was more unexpected, confirming a previously-reported pattern. But, crucially, the fascinating thing was, DUK05 reported a similar brain response pattern on the articles (“a/an”), presumably because words like ‘an’ tell the reader that the expected word ‘kite’ cannot be next (because of the English rule that the use of ‘a/an’ depends on the sound of the next word – which, by the way, doesn’t always have to be a noun, e.g., “an ugly kite”). The results from this elegant design thus supported the idea that people predicted not just the meaning of words like “kite,” but also a rather detailed aspect of the word form, namely its first sound. This finding has had a huge impact on the field and on how central a role “prediction” plays in our theories of our how our brains rapidly construct sentence meanings from speech.
DUK05 is a citation classic in language science and cognitive neuroscience (over 750 citations since 2005) that forms an empirical cornerstone of influential theories of language comprehension, and that features in all major textbooks and reviews on language comprehension. However, DUK05 has never been successfully replicated, several aspects of its statistical analysis are problematic (for example, it disregarded known sources of variance in their data, which can lead to spurious results; though their analysis was not that unusual at the time), and aspects of its analysis were contingent on the data, yielding inflated effects (e.g., see here and here). For a detailed methodological critique, see the replication study and here).
Because DUK05 had such an enormous impact on psycholinguistics but was yet to be replicated, Falk Huettig and I decided to embark on a direct replication study at the University of Edinburgh, where I worked at the time. At that point, we were not yet planning to run the study across multiple labs.
Our replication study
Initially, we wanted to generate all the sentence materials anew, but quickly realized it would be better to obtain the original materials if we wanted to attempt a direct replication. In three e-mails to the original authors, I requested their sentence materials (plus comprehension questions and norming test data) for the purpose of a clean and straightforward replication study. My third email pointed out the publisher’s policies on data and materials availability, which states that a condition of publication in a Nature journal is that authors are required to make unique materials promptly available to others without undue qualifications”.
A little while later, we were lucky enough that the authors made the 80 sentences available on a ResearchGate account, and we started adapting the materials to British English and re-norm them in Edinburgh. The comprehension questions belonging to the experiment were unfortunately never made available. I gave up trying to get the questions, in part because DUK05 had not actually used the performance on those questions to exclude participants and were, therefore, strictly speaking not part of the analysis that we wanted to replicate.
In light of more recent discussions on required power, we also realized that a single replication experiment would probably not be a sufficient contribution and we needed a larger-scale approach (see here, here and here), so we started inviting other researchers in the UK to join the replication effort. We approached labs in 12 different universities, 11 of which agreed initially. We ended up with 9 after 2 dropped out because they were not able to complete the study in the planned time frame.
Importantly, our multi-laboratory replication study tackled all the methodological and statistical issues with DUK05 that have come up in recent years. We tested a sample more than 10 times greater than that of DUK05, we employed both the original analysis and an improved statistical analysis, and we pre-registered all the crucial analyses to provide a time-stamped proof that the analyses were not tailored to achieve a certain result.
Our pre-registered analyses allowed us to conclude that we replicated the patterns that DUK05 observed for the nouns, but not for the articles: none of the analyses yielded a statistically significant effect for the determiner articles. In Bayesian analyses that were not pre-registered, we showed that the original statistical analysis approach successfully replicated the size and direction of the noun-effect, but failed to replicated both the size and direction of the article-effect. In additional Bayesian analyses, we showed that the improved statistical analysis yielded an article-effect that was likely non-zero, but that it was far smaller than what DUK05 reported and too small to observe without very large sample sizes. In other words, our results did not support DUK05’s conclusion that the a/an manipulation generated strong evidence that participants predicted the sound of an upcoming word.
We submitted our study to Nature Neuroscience in February of 2017.
In tomorrow’s installment, find out what happened after the team submitted their manuscript — which turned out to be the first of two submissions.
Like Retraction Watch? You can make a tax-deductible contribution to support our growth, follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up for an email every time there’s a new post (look for the “follow” button at the lower right part of your screen), or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at firstname.lastname@example.org.