Nearly 100 years ago, Muriel Bristol refused to drink a cup of tea that had been prepared by her colleague, the great British statistician Ronald Fisher, because Fisher had poured milk into the cup first and tea second, rather than tea first and milk second. Fisher didn’t believe she could tell the difference, so he tested her with eight cups of tea, half milk first and half tea first. When she got all eight correct, Fisher calculated the probability a random guesser would do so as well – which works out to 1.4%. He soon recognized that the results of agricultural experiments could be gauged in the same way – by the probability that random variation would generate the observed outcomes.
If this probability (the P-value) is sufficiently low, the results might be deemed statistically significant. How low? Fisher recommended we use a 5% cutoff and “ignore entirely all results which fail to reach this level.”
His 5% solution soon became the norm. Not wanting their hard work to be ignored entirely, many researchers strive mightily to get their P-values below 0.05.
Continue reading How (not) to deal with missing data: An economist’s take on a controversial study