We are pleased to present an excerpt from Distrust: Big Data, Data-Torturing, and the Assault on Science, a new book by Pomona College economics professor Gary Smith. The Washington Post said the book’s lessons “are very much needed.”
The fact that changes in bitcoin prices are driven by fear, greed, and manipulation has not stopped people from trying to crack their secret. Empirical models of bitcoin prices are a wonderful example of data torturing because bitcoins have no intrinsic value and, so, cannot be explained credibly by economic data.
Undaunted by this reality, a National Bureau of Economic Research (NBER) paper reported the mind-boggling efforts made by Yale University economics professor Aleh Tsyvinski and a graduate student, Yukun Liu, to find empirical patterns in bitcoin prices.
Tsyvinski currently holds an endowed chair named after Arthur M. Okun, who had been a professor at Yale from 1961 to 1969, though he spent six of those eight years on leave so that he could work in Washington on the Council of Economic Advisors as a staff economist, council member, and then chair, advising presidents John F. Kennedy and Lyndon Johnson on their economic policies. He is most well known for Okun’s law, which states that a 1 percentage-point reduction in unemployment will increase U.S. output by roughly 2 percent, an argument that helped persuade President Kennedy that using tax cuts to reduce unemployment from 7 to 4 percent would have an enormous economic payoff.
After Okun’s death, an anonymous donor endowed a lecture series at Yale named after Okun, explaining that
Arthur Okun combined his special gifts as an analytical and theoretical economist with his great concern for the well-being of his fellow citizens into a thoughtful, pragmatic, and sustaining contribution to his nation’s public policy.
The contrast between Okun’s focus on meaningful economic policies and Tsyvinski’s far-fetched bitcoin calculations is striking.
Liu and Tsyvinski report correlations between the number of weekly Google searches for the word bitcoin (compared to the average over the past four weeks) and the percentage changes in bitcoin prices one to seven weeks later. They also looked at the correlation between the weekly ratio of bitcoin hack searches to bitcoin searches and the percentage changes in bitcoin prices one to seven weeks later. The fact that they reported bitcoin search results looking back four weeks and forward seven weeks should alert us to the possibility that they tried other backward-and-forward combinations that did not work as well. Ditto with the fact that they did not look back four weeks with bitcoin hack searches. They evidently tortured the data in their quest for correlations.
Even so, only seven of their fourteen correlations seemed promising for predicting bitcoin prices. Owen Rosebeck and I looked at the predictions made by these correlations during the year following their study and found that they were useless. They might as well have flipped coins to predict bitcoin prices.
Liu and Tsyvinski also calculated the correlations between the number of weekly Twitter bitcoin posts and bitcoin returns one to seven weeks later. Unlike the Google trends data, they did not report results for bitcoin hack posts. Three of the seven correlations seemed useful, though two were positive and one was negative. With fresh data, none were useful.
The only thing that their data abuse yielded was coincidental statistical correlations. Even though the research was done by an eminent Yale professor and published by the prestigious NBER, the idea that bitcoin prices can be predicted reliably from Google searches and Twitter posts was a fantasy fueled by data torturing.
The irony here is that scientists created statistical tools that were intended to ensure the credibility of scientific research but have had the perverse effect of encouraging researchers to torture data—which makes their research untrustworthy and undermines the credibility of all scientific research.
Traditionally, empirical research begins by specifying a theory and then collecting appropriate data for testing the theory. Many now take the shortcut of looking for patterns in data unencumbered by theory. This is called data mining in that researchers rummage through data, not knowing what they will find.
Way back in 2009, Marc Prensky, a writer and speaker with degrees from Yale and Harvard Business School, claimed that
In many cases, scientists no longer have to make educated guesses, construct hypotheses and models, and test them with data-based experiments and examples. Instead, they can mine the complete set of data for patterns that reveal effects, producing scientific conclusions without further experimentation.
We are hard-wired to seek patterns but the data deluge makes the vast majority of patterns waiting to be discovered illusory and useless. Bitcoin is again a good example. Since there is no logical theory (other than greed and market manipulation) that explains fluctuations in bitcoin prices, it is tempting to look for correlations between bitcoin prices and other variables without thinking too hard about whether the correlations make sense. In addition to torturing data, Liu and Tsyvinski mined their data.
They calculated correlations between bitcoin prices and 810 other variables, including such whimsical items as the Canadian dollar–U.S. dollar exchange rate, the price of crude oil, and stock returns in the automobile, book, and beer industries. You might think I am making this up. Sadly, I am not.
They reported finding that bitcoin returns were positively correlated with stock returns in the consumer goods and health care industries and negatively correlated with stock returns in the fabricated products and metal mining industries. These correlations don’t make any sense and Liu and Tsyvinski admitted that they had no idea why these data were correlated: “We don’t give explanations . . . . We just document this behavior.” A skeptic might ask: What is the point of documenting coincidental correlations?
And that is all they found. The Achilles heel of data mining is that large data sets inevitably contain an enormous number of coincidental correlations that are just fool’s gold in that they are no more useful than correlations among random numbers. Most fortuitous correlations do not hold up with fresh data, though some, coincidentally, will for a while. One statistical relationship that continued to hold during the period they studied and the year afterward was a negative correlation between bitcoin returns and stock returns in the paperboard-containers-and-boxes industry. This is surely serendipitous—and pointless.
Scientists have assembled enormous databases and created powerful computers and algorithms for analyzing data. The irony is that these resources make it very easy to use data mining to discover chance patterns that are fleeting. Results are reported and then discredited, and we become increasingly skeptical of scientists.
Like Retraction Watch? You can make a tax-deductible contribution to support our work, follow us on Twitter, like us on Facebook, add us to your RSS reader, or subscribe to our daily digest. If you find a retraction that’s not in our database, you can let us know here. For comments or feedback, email us at [email protected].
Great article. The above also illustrates the challenge of using AI for drug discovery. AI will identify statistical correlations, even if there is no logical causal relationship.
Tyler Vigen has done a superb job illustrating these artifactual correlations. https://www.tylervigen.com/spurious-correlations As an example, he shows that the divorce rate in Maine is tightly correlated with per capita consumption of margarine (r=0.9926). I wonder what he could do with bit coin?
I strongly suspect “using AI for drug discovery” will end in the same way as attempts to use a computer to determine potential new liquid rocket fuels did: lots of money expended to obtain useless data. (See John D. Clark, “Ignition”, pp. 171-172.)
Simply data mining for correlations in existing data will certainly generate models, mostly spurious, but not necessarily.
But reporting those correlations as “science” is ludicrous because they are hypotheses that have not been tested. That appears to be what Liu and Tsyvinski did.
What they needed to do was then wait a couple years and see if their hypothetical predictive models actually matched with real data that was not included in their training set. And it turns out they did not.
Publishing the paper as it was was just publishing possible hypotheses describing previous events. Maybe useful, but certainly mislabeled.
I don’t see anything wrong with generating hypothetical models this way, but that is only the first step of a process, not a final result. And even if you do find a robust result using new data, without some analysis of the underlying causal factors the whole thing is pretty weak tea.
Good article.
I guess it’s worth pointing out that this problem –
“… The fact that they reported bitcoin search results looking back four weeks and forward seven weeks should alert us to the possibility that they tried other backward-and-forward combinations that did not work as well… ”
– was explained very well in a XKCD comic. https://xkcd.com/882/
Grotesque! As grotesque as the claim that tides are induced by the Moon or that mold could cure infectious deseases!
“because bitcoins have no intrinsic value ”
Oh dear, another one. Bitcoin has no intrinsic value – as opposed to the US dollar? Tell us what is the intrinsic value of a piece of paper with numbers and letters on it, much less a record in a digital ledger? Or maybe gold – there’s something inside there that has value? The Chinese didn’t want gold – they dealt in silver for centuries. Did they not know ‘intrinsic’ when they saw it? Someone needs to do some serious thinking about the nature of money before he has anything else to say about the value of Bitcoin. The dollar has no intrinsic value – that’s why the government needs guns to enforce it’s ‘value.’ Bitcoin doesn’t need guns – that’s the whole point.
I explain in an earlier chapter that “intrinsic value” in finance means the amount you would pay to get the income from the asset forever. Bitcoin, paper money, gold, and silver generate no income and therefore have no intrinsic value. They may be used as a medium of exchange but, as an investment, fools buy hoping to sell to greater fools.
“They reported finding that bitcoin returns were positively correlated with stock returns in the consumer goods and health care industries and negatively correlated with stock returns in the fabricated products and metal mining industries.”
This seems to be an artifact of history; Bitcoin was at its highest during the earlier stages of Covid-19 — a good time for healthcare corporations and toilet paper brokers, and not as lucrative time for industrial products. An excellent way to predict the analysis timeframe? Yes. Historical fallacy? Clearly.