This has been on my mind for a while.
I think the observation is best summed up as:
If somebody asks if something is statistically significant, they probably don’t know what it means.
I don’t mean to offend anyone, and I can think of plenty of counter examples*, but this is borne out of long observation of conversations among both scientists and non-scientists. Statistically significant is sometimes used as a proxy for true, and is sometimes muddled with significant or meaningful or large. In climate, it also gets confused with caused by human activity.
Even those that have done lots of statistics can forget that it only tells how likely you are to see something, given something you think probably isn’t true.
It’s one of those horrible, slippery concepts that won’t stay in the brain for any length of time, so you** have to go over it again, and then again, every time to make sure that yes, that’s what it really means, and yes, that’s how it fits in with your problem.
No wonder people don’t know what it means.
*This is just a personal observation, but there is data out there. I’m sure people will provide counter examples in the comments.
** And by you, I really mean I.
Interesting post and I think you do make a very good point. Do you think that the problem might stem from people’s academic backgrounds. If one was, for example, doing a proper pharamceutical study, then – if it was done properly – there would be a control group and a test group. Comparing the control group with the test group could then tell you if the effect of a particular drug was significant or not. One could find similar examples in other areas.
On the other hand, if one is working in an area like climate science, then one can take lots of measurements (surface temperatures for example) and can determine trends and uncertainties in trends, but there’s no control. So, as you say in the post, the uncertainties tell you something about the likelihood of a particular trend. Saying it’s statistically significant or not doesn’t make sense because, typically, that’s being judged against a zero trend and there’s no actual control group that says a zero trend is what is expected.
Of course, I could just be very confused myself 🙂
It is a concept that is pretty widely misunderstood, apparently even by stats teachers
Misinterpretations of Significance: A Problem Students Share with Their Teachers?
Heiko Haller & Stefan Krauss
A large problem is that frequentist hypothesis tests are easily performed, but deeply counter-intuitive, so it is perhaps unsurprising that many people use them without really understanding what is being done. Unfortunately, on the other hand, the Bayesian equivalents are easier to understand but much harder to perform, so they are rarely used.
The thing to do is to always search for the interpretation that provides as little support for your position as possible.
andthentheresphysics, statistical significance is also useless in the drug testing case. All it tells you is whether two groups are different. No two groups of people or animals will every be exactly the same, so whether you give drug or not the two groups are different. So significance is purely determined by the sample size and ability of the researcher to reduce individual variability.
So at what odds do you bet your wealth or your life?
Yes, I think an odds approach would be informative in many of the cases currently under discussion.
If we choose our statistical model to represent natural variation—as is usual—and we find some observations that are statistically significant, then how would you interpret that? Put another way, our assumption is that the model represents natural variation and we have observations that lie outside the expected range of the model; so what led to those observations?
I know of only two interpretations for those observations. One interpretation is that something very unusual happened just by chance (perhaps because the significance level was not set conservatively enough). The other interpretation is that there was some non-natural variation—which was presumably be caused by human activity.
“If somebody asks if something is statistically significant, they probably don’t know what it means.”
Caveat: On the internet people posting under pseudonyms may actually be statisticians, and you won’t make a good impression if you actually quote this! ;o)
1) The word “probably” is key here 2) even statisticians can confuse statistical and practical significance in systems that they don’t understand very well.
1) Indeed, the point I was making was really that although it may be true that often the people who ask if something is statistically significant don’t know what it means, but you have to consider the question as if they did anyway, rather than disregard it. It may be that they were actually making a valid point, or worse still pointing out a problem with your position, and you would look pretty bad if you just dismissed the question as probably ignorant.
2) Agreed, the paper I mention in my earlier comment is a classic.
Sadly I am now in the position to confirm that “and you would look pretty bad if you just dismissed the question as probably ignorant.” is true. Great way to alienate those who support your general position because of an unwillingness to acknowledge a minor limitation in the statistical support for a claim.
I see this has had some activity recently. Here’s my take.
There’s a bit more to significance tests than discussed here. The basic Fisher/Neyman/Pearson idea is that you have some hypothesis you are testing and work out the probability of observing the data you have if this hypothesis were true. If this probability is small enough (usual conventions 5% ,1% etc) then you say that there is evidence against the tested hypothesis. What you are working out is the probability of the data given the hypothesis. A lot of people confuse this idea with the probability of the hypothesis given the observed data. Some people, mainly Bayesians?, say that it’s the latter probability that we want. But hard-line Frequentists, like myself, claim that such a probability makes no sense in a frequentist world. Hypotheses are either true or false.
To work out the probability of the data given the null hypothesis we need to know about the distribution of the sample statistic used. This is where a lot of nonsense gets published, even in good journals. The problem is that the tabulated significance levels for the usual test statistics are only accurate if certain assumptions are made about the underlying data generating process. If these assumptions are not met then the actual probabilities may be very different from what we might call the conventional “nominal” values. These assumptions can be tested. For example you can test to see whether you regression equation has serially correlated residuals. If it does your nominal significance levels are meaningless. Something systematic has been left out of your equation which will bias the results. I suspect that most simple regressions of temperature on a trend suffer from this problem, because temperature in one period tends to be close to that of the neighbouring periods.
Deborah Mayo is a philosopher of science at Virginia Tech, who has written extensively on these issues. She has developed a related concept of “severity”. Not all significance tests are equal. To get good evidence for a hypothesis you need a severe test. A severe test is a test where there is a very high probability that the test procedure would NOT give a pass if the hypothesis is false. Some tests are much too easy to pass to give good evidence. Her blog at errorstatistics.com is well worth looking at. And as far as I know she has ever said anything about climate, this is just pure philosophy of statistics!