Did that headline make sense? It isn’t really supposed to – it’s a sum-up of a recent satirical paper by Columbia statistician Andrew Gelman and Jonathan Falk of NERA Economic Consulting, entitled “NO TRUMP!: A statistical exercise in priming.” The paper – which they are presenting today during the International Conference on Machine Learning in New York City – estimates the effect of the Donald Trump candidacy on the use of no wild cards (known as trump cards) in the game of bridge. But, as they told us in an interview, the paper is about more than just that.
Retraction Watch: You have a remarkable hypothesis: “Many studies have demonstrated that people can be unconsciously goaded into different behavior through subtle psychological priming. We investigate the effect of the prospect of a Donald Trump presidency on the behavior of the top level of American bridge players.” Can you briefly explain your methodology, results and conclusions?
Jonathan Falk: Bridge is a card game in which two teams engage in a bidding phase which ends in a contract, followed by cardplay which results in the contract being achieved, or not. About 30 percent of all hands played by elite bridge players are so-called No Trump contracts. We used data sets of a contemporary US tournament, the same tournament played in 1999 and a similarly elite contemporary European tournament to see if there were observable differences in the behavior of bridge players after the political rise of Donald Trump. In a world in which #NeverTrump has become an internet meme, we wanted to see if bridge players would alter either their bidding or card play to reflect the unconscious pull of the Trump meme.
We found a small change in the propensity of contracts to end in No Trump, but this difference was not statistically significant. But we found a statistically significant difference between the 1999 and 2015 US tournaments in the fraction of No Trump contracts successfully made. In the later period, successful No Trump contracts were a much higher percentage of all contracts. The European results, while similar to the later US results, were not significantly different from the earlier US results. We conclude from this that elite bridge players, despite their obvious incentives to bid and play as well as possible despite any outside political influences, have nonetheless chose to allow more No Trump contracts to be made in the post-Trump era. This accords with our initial hypothesis that unconscious factors will change otherwise rational thought processes, as in the embodied cognition literature.
RW: This paper is clearly a satire, correct? Why did you decide to write a satire, and why on this topic?
JF: We’re glad you recognize it as satire. We hope everyone does. But what worries us is that highly publicized findings not much better motivated or supported are reported uncritically all the time. We stress that while this paper is satire, the methodology is a common one, the statistics are accurately reported, and for all we know, the conclusion might even be correct! But what concerns us is that in the quest for headline-grabbing results, the motivation to cut corners through self-delusion (if not outright fraud) by the researcher leads to what one of us (Gelman) has called “the garden of forking paths,” the ability to make subtle choices along the path to publication that makes results, true or not, unreliable.
All scientific study focuses in one way or another on a difference between a treatment group and at least an implicit control group. But “difference” is not a well-defined thing. It can have dozens of dimensions, a variety of sizes, and be imbedded in different amounts of noise. Careful science is separation of the “real” difference from the noise. “Garden of forking path” science exploits, where feasible, the noise to amplify the conclusions.
Andrew Gelman: We’re poking fun at a whole genre of scientific studies that have been published in top journals (Science, Nature, Psychological Science, Proceedings of the National Academy of Sciences, etc) and have been publicized, often uncritically, in trusted news sources such as the New York Times and NPR. These studies typically follow a pattern: small sample size, noisy measurements, and, most crucially, enough flexibility in the motivating theory and the statistical analysis so that it’s just about always possible to get statistical significance by looking at the data from enough different angles. We’ve discussed literally zillions of these examples on our blog, and I covered some of my favorites in a recent post for Retraction Watch. But they just keep coming and coming.
Now, don’t get me wrong — I’m not saying that these researchers are all acting on bad faith. And I’m not even saying that these conclusions are all wrong. For example, the political scientists Larry Bartels and Chris Achen wrote a paper with some data analysis purporting to show that the presidential election of 1916 was determined in part by shark attacks on the Jersey shore. That’s pretty funny, and I don’t really believe it; I don’t find their analysis convincing. But Bartels and Achen are serious researchers, they know a lot of political science, and who knows, maybe they’re right on this and I’m wrong. It’s hard to know with this sort of study: the data are historical so there’s no real way to replicate. The point is, these guys are not jokers, and their work bears looking at even if we disagree.
Other studies, though, I think we really can just throw in the trash.
RW: You have a great section on future attempts at reproducing your findings, in which you rebut all predicted criticisms. Did you enjoy writing this section, and can you speak more about how it relates to the larger – and all too real – problem of reproducibility?:
We expect that after these results appear in NPR, TED, Gladwell, and the prestigious Proceedings of the National Academy of Sciences, there will be pushback from the inevitable replication bullies, those uncreative types who seem to exist only to criticize. To save everyone trouble, we will preregister now the following responses to any future failed replications: (1) The replication was unfaithful to our original study because of various details not mentioned in this publication because of lack of space; (2) The replication was successful in demonstrating a heretofore unhypothesized interaction with outdoor temperature, relationship status, parental socioeconomic status, or some other crucial variable not included in our original study; and (4) Had the replication used a large enough sample size, it would surely have been statistically significant. In short, disbelief is not an option. The results are not made up, nor are they statistical flukes. You have no choice but to accept that the major conclusions of these studies are true.
JF: Reproducibility is hard in the best of circumstances. What has dismayed both of us is a defensive attitude on the part of original authors. This attitude is completely understandable, but dismaying nonetheless. We wrote this section parodying the sorts of responses we see all the time. And if the objections are pre-registered, they must have more force! We know of very few examples where an original author ever said: “Hmmmm… you did exactly what I did and your effect isn’t nearly as big as mine. I guess I was fooled by a statistical significance filter into believing something that isn’t true.” But we all believe things that aren’t true, we all rely on noisy evidence to reach those conclusions, and extraordinary claims are less likely to be true than false even with p<0.05. We should be saying things like this a lot, instead of carping about the exact conditions under which the result obtained or searching for hitherto unsuspected mediators of the effect or subgroups in which the effect is mysteriously absent or augmented.
AG: Replication is hard! A colleague and I recently coordinated a replication of one of our own studies–it was an analysis of some survey data, which we replicated by finding new surveys from the same and different years. First it was hard for us to exactly reproduce what we’d done before, then it was a challenge to update our analysis and fix various aspects of our model that we weren’t completely happy with. And then, once we had all the bugs worked out and we’d set up our preregistration plan, it was not so easy to find a journal that was willing to publish our replication. This experience just reinforces my belief that replication should be rewarded, and that researchers should welcome replications of their work, rather than responding defensively.
RW: What do you hope to accomplish with this paper?
JF: As baseball statistician (and inspiration to both of us) Bill James once said: “Sportswriters, in my opinion, almost never use baseball statistics to try to understand baseball. They use statistics to decorate their articles. They use statistics as a club in the battle for what they believe intuitively to be correct. That is why sportswriters often believe that you can prove anything with statistics, an obscene and ludicrous position, but one which is the natural outgrowth of the way that they themselves use statistics. What I wanted to do was teach people instead to use statistics as a sword to cut toward the truth.” In the years since, sports journalism has moved toward the ideals of Bill James. We’d like academic reporting to be as serious and critical as sportswriting.
AG: The tradition in science writing seems to be “scientist as hero,” except in the rare occasions when there’s a scandal, in which case it’s scientist as villain. But science is a human endeavor. I am concerned that statistics is often used as a sort of “uncertainty laundering,” as a way to transmute randomness into confidence results.
To the extent that this article is sending a message (Don’t trust hyped, p-hacked studies!), I’m sure we’re preaching to the converted. But the converted deserve a laugh too.
If our paper makes just one person laugh a little bit, it’s all been worth it.
Finally, I wouldn’t be surprised if some readers of Retraction Watch will be annoyed at our paper. Science hype and questionable research practices are a big problem, maybe too important to be joking about. I want to emphasize that our paper is a satire, not a hoax. There is no need to demonstrate that pseudoscience papers can be accepted at leading journals. Our paper is full of the academic equivalent of slapstick humor and is not intended to be confused with a seriously intended work of research.
Like Retraction Watch? Consider making a tax-deductible contribution to support our growth. You can also follow us on Twitter, like us on Facebook, add us to your RSS reader, sign up on our homepage for an email every time there’s a new post, or subscribe to our new daily digest. Click here to review our Comments Policy. For a sneak peek at what we’re working on, click here.
I thought the paper was a riot — both as a psychological scientist and as a bridge player. But as a bridge player who knows many of the elite players, I though the funniest line in the paper was this:
“… elite bridge players, whom we assume are otherwise completely typical.”
Ummm… NO.
So basically, everyone still needs to be reminded that correlation is not causation. Stats class spent so much time hammering that into our heads.
This is pointing out these studies have problems that are even more basic, so we shouldn’t even get to the point of deciding if the effect we see is meaningful.
There is a lot of science that is based on observational data and it is something that has to be interpreted much more carefully. If we notice that people with a certain cancer have higher arsenic levels than those that don’t we can carefully try to make sure that there isn’t another reason. Maybe they were exposed to another chemical that is associated with arsenic, etc. We can look at different populations to make sure the effect is the same. We are never going to expose people to arsenic so it will always have to be based on correlations.