Here is the text of an email that I sent to Doug Keenan on 25th January 2013. It sets out some of my personal thoughts on statistical modelling of trends in global mean temperature (or many of the other timeseries in the Earth system).
I believe it has some bearing on this post at Bishop Hill.
Dear Doug Keenan,
Thanks for clarifying your concerns somewhat; hopefully I can set your mind at rest on some of
your worries. I suspect that parliamentary questions are not the best route through which to
discuss some of the technical aspects of statistical theory.
Please rest assured that we do not use a linear statistical analysis of global temperature trends
in anything other than a descriptive manner. I believe that the trends were reported as linear as a
method of description for consistency with the IPCC Fourth Assessment Report. The report
itself acknowledges the deficiencies of the statistical model that it uses, while arguing that the
analysis still gives some useful information. I’m inclined to agree.
These trend lines are just there to summarise the data, and shouldn’t be used (for example) to
extrapolate a forecast of global temperatures in the future. They shouldn’t be taken as a
representing the “true” statistical model for global warming. Perhaps you might suggest other
methods for summarising the data?
I think the problem is the proxy war being fought around the word “significant”, and the difference
between its scientific and formal statistical use. We should be careful to separate out their use.
There appear to be individuals who would like to say that the evidence for warming is “not
statistically significant”, and others who would like to say that it is. This is to misunderstand the
nature of “significance”.
Of course, we both know that the correct answer to the question “is the trend of global mean
temperature statistically significant?” is not “yes”, or even “no” but, “that is not a valid question”.
This is because the appropriate statistical model to use for the timeseries is not known perfectly,
and any statistical significance test uses assumptions about that timeseries that may turn out to
be invalid. Cohn and Lins (2005) for example, state “significance depends critically on the null
hypothesis which in turn reflects notions about what one expects to see.”
This does not mean that there is no evidence of a significant increasing temperature trend
there is plenty of scientific evidence for this just that using naive statistical tests in the absence
of other information is inappropriate.
A significance test attempts to answer the question “given that there was no anthropogenically
driven global warming, what is the probability that we would see these temperatures?” This is
interesting, but not really what we are looking for. As you and others have noted, this kind of test
does not allow you to distinguish between forced trends, and the degree of long term persistence
in the system. I would suggest a Bayesian solution to this problem. I think the appropriate
question is “given that we see this these temperatures, what is the probability they are
anthropogenically driven?” Using Bayes theorem, this combines the likelihood (from the first
question), with the prior probability that global temperatures are anthropogenically driven, to
some degree. Of course, this prior probability contains subjective judgements and information
from elsewhere, including fundamental physics.
Our judgements about the probability that temperatures are anthropogenically driven, contain
information not just from the global temperature trend, but also our knowledge of the way that the
system works. The global temperature series in isolation simply does not contain the information
we are looking for.
The Bayesian solution cuts through the problem that you identified, of being able to tell the
difference in the validity of (as in your example), a linear trend with an autoregressive element,
and a driftless autoregressive integrated model. The model that we choose has to be consistent
with the understanding of the system that we gain from other data, and our basic knowledge of
the physics of the system. We encode our knowledge about physics, and the rest of the system
in climate models, and their simulations. Again, I stress that a great deal of work has been done
in this field, but that it is more likely to be found in the detection and attribution literature (and
corresponding IPCC chapter), than in the observations literature.
In conclusion, my suggestion is that when asked if there has been a “significant” change in
temperatures since the 1880s, we should say “yes”. If we are asked if there has been a
statistically significant change in temperatures since the 1880s, we do not say “yes” or “no”, we
say “that is not a valid question”. A difficulty may be in getting people to understand the reasoning
sufficiently to accept the latter as the correct answer to the question. In this, I welcome your help.
Cohn, T.A. and Lins, H.F. (2005) Nature’s style: Naturally trendy, Geophys. Res. Lett., 32,
I sent a reply to that, on 27 January 2013. Following is an extract from my reply. (There was no response.)
Your message says the following.
What is the reason?
Your message also says this.
That surely is a valid question! There are other situations where the data is well enough understood to select a model, albeit not with absolute certainty. What we should say in this situation is “we are currently unable to determine if the increase is statistically significant”.
I agree that incorporating some physics into the statistical model is desirable, and probably necessary to select a model. Indeed, that was why I was so interested in the paper of Koutsoyiannis [Physica A, 2011]: it is the only serious attempt to incorporate physics that I have seen. As you know, though, I found that there is an invalidating problem with the paper (and my finding that problem shows I will criticize statistical work on the skeptical side too).
Thanks for your contribution.
What makes the rise of 0.8C between 1880 and today significant? And why 1880? Is that because we don’t have reliable temperature records before 1880? I haven’t the remotest clue about the statistics and am not qualified to speak to them, but if the average global temperature (I don’t even know if I’d describe this as a useful metric over the timeframe unless wer are looking at exactly the same thermometers in exactly the same places, but admit there is a need for a metric) rises from 15C to 15.8C over a period of 130 years, what makes that signigicant?
Hi geronimo, thanks for the interesting questions. See my response to Tom on ‘hypotheticals’ (particularly ones that might require some degree of deeper analysis). I would say that there are a number of things which might make a temperature change ‘scientifically significant’ (as opposed to ‘statistically significant’). First is the magnitude of that change, compared to our estimates of the natural variability of the Earth system. Second is nature of that change, when looking at the details (also search for stratospheric cooling, polar amplification, ocean heat changes, land ocean warming differentials). Third is the magnitude of Earth system, biospheric, and human system changes that might be brought about due to such a change.
You can find out more about uncertainty in temperature records through time here, and, if you’d like to get involved with changing those uncertainties, here.
Hi Doug, you’ll have to be parient with me I was looking for the answers I had expected that the increase would be significant if it was above what was expected from natural variability and take on board the other things you way you would take into consideration. So what is so different about the 0.8C rise in global temperature anomoly that makes it significant? I’ll leave that one there.
I didn’t think the approx 0.4C rise in the sixty year period was an hypothetical question, it seems to me that if a rise of 0.8 in 120 years is significant then I would have expected a rise of 0.4C in 60 years to have some level of significance. I recognise from your answer that it’s not simple, or rather it can’t be explained simply, but in all these matters I stick with Rutherford’s dictum:
“If you can’t explain your theory to a barmaid, it probably isn’t good physics.”
OK in this instance a barman.
I’ve looked at the two sites you posted, and was, to be honest, puzzled as to why you’d sent me there. The first site was the usual impenetrible Met Office site built for other researchers, and the second site wanted me to send them temperature readings from the Arctic circle.
Then the penny dropped, you were angry at me because I had suggested that I didn’t believe that the GATA meant much without consistent data. You really shouldn’t be so thin-skinned. I didn’t say anyone was doing anything untoward, I merely said that the GATA would have more meaningful if the data had come from a consistent set of measuring instruments. It doesn’t so there must be errors. That the researchers try to iron out those errors isn’t in question, and was never intended to be.
Wouldn’t mind a proper answer to my two “interesting” questions neither of which was hypothetical because presumably you use some scientific techniques to assign “scientifically significant” which you can apply equally to the two end points of 1940 and 2000. Or were you annoyed at the questions as well? I do seem to have rubbed you up the wrong way for which I would apologise if I could figure out how I’d done it.
I’m afraid you may have read a little too much into my comment: there was no offence taken, and certainly none meant. The sites I pointed to offer good starting points for understanding the uncertainties involved in measures of global temperature. You’ll have to put in a bit of leg work though – I’m afraid my time for answering questions here is highly limited.
Really pleased that I hadn’t caused any offence, I don’t mind causing offence, but like it to be intentional. I guess your response is telling me you don’t intend to tell me how “scientific significance” is defined because you’re too busy. How about pointing me to the scientific literature? It does seem an awfully fuzzy concept to the non-cognescenti.
Hi Doug, me again, it’s unusual to have access to the Met Office’s thinking so I’ll ask a supplementary if I may. If we’d stopped the clock in 1940, we would be experiencing a rise in temperature of approx. 0.4C over a 60 year period. Would you describe that as significant? If not, why not
To me it’s significant because it is highly unlikely to have been caused by human emissions. So, logically, we could only ascribe approx. 0.5 rise in temperature to human emissions, yet I believe the IPCC, and Met Office have agreed that most of the rise in temperature during the 20th century is very likely caused by human emissions. Is that 0.5C the bit they’re talking about?
What about the 0.0C rise since 1997? Is that significant?
I think you can probably work out my feelings on this by reading the post above.
But if there _had_ been a 0.2C rise in temperature since 1997, you would definitely say that was significant, wouldn’t you?
So, “given that we see this these temperatures, what is the probability they are anthropogenically driven?”
I’d rather not get into hypothetical musings, thanks.
Well you leave it to us to hypothesize why you reject a statistical model that is 1000 times better than yours.
It’s a pleasure to read such polite and to-the-point discussion in this field (I say that as a regular reader at Climate Audit, Bishop Hill and others). However, I have to side with Keenan here – it seems that the best answer to the question of the statistical significance of the increase since 1880 is ‘we don’t have enough evidence/knowledge to say for sure one way or the other’. This goes as much for selection of an appropriate model as it does for the inclusion of appropriate physics etc in the Bayesian priors. The rise (so far) has not been dramatic enough to be firmly attributed one way or the other.
Skeptics may want to be able to claim ‘look, the rise isn’t significant’, and one can understand the government and Met Office not wanting to hand over that kind of ammunition, but it’s equally true that the government and Met Office want to be able to continue to claim ‘look, the rise is significant’. It’s good for all sides to have the discussion, reasoning and evidence in the open.
Interesting exchange. The outcome of any statistical test is only as valid as the stastical assumptions are that underlie that test. Which, I think, is your point.
I’ve been reading: “Using data to attribute episodes of warming and cooling in instrumental records”, by Tung and Zhou (http://www.pnas.org/content/110/6/2058.short). Interesting reading. The conclusions are only as strong as the underlying assumptions are (does a UK record represent long term AMO change?, etc). Nice discussion here (http://arstechnica.com/science/2013/01/a-climate-seesaw-in-the-atlantic/)
In your exchange with Douglas, you refer to Detection and Attribution – where physical understanding is brought into the framework. The kind of thing that is shown here:
where the observed temperature trends appear to be consistent with the anthropogenically and naturally forced simulations, but not with the natural only. This kind of attribution is not baysian. It is testing whether observed trends are consistent with either natural or anthropogenic signals, using estimates of what is physical expected from internal climate variability.
This physical understanding is central.
It doesn’t really matter what statistical method was used. When the conclusion is absurd [CAGW] the whole analysis is junk!
Thanks for that razor sharp analysis.
The main problem seems to be the IPCC attributing all the warming to CO2. Prof. Akasofu has a good point in suggesting that there was a fairly steady rise in temperature since the Little Ice Age that should be subtracted from this.
Click to access two_natural_components_recent_climate_change.pdf
Caveat clicker, that link is over 50 megs, and unlikely to shed much light on the subject, in my opinion.
Why do you say that? Although the pdf file is long, it is largely graphical and doesn’t take too long to read.
If you don’t want to spend the time, at least look at figure 2b on page 7, that summarizes much of the conclusions of the first part.
[…] for the Keenan Kerfuffle, my only negative comment on Doug McNeal’s response to Keenan is that he might have edited out the line breaks so the text wouldn’t display badly when I […]
I think statistical significance can only be defined with respect to the level of noise: possible choices are:
1) statistically significant with respect to measurement uncertainty
2) statistically significant with respect to interannual variability
3) statistically significant with respect to natural variability
I think that choice “2” is the usual metric that people think of, and is well answered by providing a linear trend and uncertainty (perhaps with some autoregression). “3” is really a detection and attribution question, and would be best answered by using all the information available: if we are limited to just the temperature series itself, we’d have to throw up our hands, because how can one use it to differentiate between a large natural drift and a forced anthropogenic trend?