And to think a few weeks ago I was thinking that not much had been happening in climate science…now here’s another interesting issue. I previously blogged quite a lot about the Cox et al paper (see here, here, here). It generated a lot of interest in the scientific community and I’m not terribly surprised that it provoked some comments which have just been published and which can be found here (along with the reply) thanks to sci-hub.
My previous conclusion was that I was sceptical about the result, but that the authors didn’t seem to have done anything badly wrong (contrary to the usual situation when people generate surprising results concerning climate sensitivity). Rather, it seemed to me like a case of a somewhat lucky result when a succession of reasonable choices in the analysis had turned up an interesting result. It must be noted that this sort of fishing expedition does invalidate any tests of statistical significance and thus also invalidates the confidence intervals on the Cox et al result, but I didn’t really want to pick on that because everyone does it and I thought that hardly anyone would understand my point anyway
The comments generally focus on the use of detrended intervals from the 20th century simulations. This was indeed the first thing that came to my mind as a likely Achilles’ heel of the Cox et al analysis. I don’t think I showed it previously, but the first thing I did when investigating their result was to have a play with the ensemble of simulations that had been performed with the MPI model. The Cox et al analysis depends on an estimate of the lag-1 autocorrelation of the annual temperatures of the climate models. Ideally, if you want to calculate the true lag-1 autocorrelation of model output, you should use a long control run (ie an artificial simulation in which there is no external forcing). Of course there is no direct observational constraint on this, but nevertheless this is one of the model parameters involved in the Cox et al constraint, for which they claim a physical basis.
As well as having a long control simulation of the MPI model, there is also a 100-member of 20th century simulations using it. The size of this ensemble means that as well as allowing an evaluation of the Cox et al detrending approach to estimate the autocorrelation, we can also test another approach which is to remove the ensemble average (which represents the forced response) rather than detrending a single simulation.
This figure shows the results I obtained. And rather to my surprise….there is basically no difference. At least, not one worth worrying about. For each of the three approaches, I’ve plotted a small sample from the range of empirical results (the thin pale lines) with the thicker darker colours showing the ensemble mean (which should be a close approximation to the true answer in each case). For the control run I used short chunks comparable in length to the section of 20th century simulation that Cox et al used. The only number that actually matters from this graph is the lag-1 value which is around 0.5, the larger lags are just shown for fun. The weak positive values from 5-10 years probably represent the influence of the model’s El Nino. It’s clear that the methodological differences here are small both in absolute terms and also relative to the sample variation across ensemble members. That is to say, sections of control runs, or 20th century simulations which are either detrended or which have the full forced response removed, all have a lag-1 autocorrelation of about 0.5 albeit with significant variation from sample to sample.
Of course this is only one model out of many, and may not be repeated across the CMIP ensemble, but this analysis suggested to me that this detrending approach wasn’t a particularly important issue and so I did not pursue it further. It is interesting to see how heavily the comments focus on it. It seems that the authors of these got different results when they looked at the CMIP ensemble.
One thing I’d like to mention again, which the comments do not, is that the interannual variability of the CMIP models is actually more strongly related to sensitivity, than either the autocorrelation or Cox’s Psi function (which involves both these parameters) are. Here is a the plot I showed earlier. Which is of course a little bit dependent on the outlying data points (as was commented on my original post). This is sensitivity vs SD (calculated from the control runs) of the CMIP models of course.
I don’t know why this is so, I don’t know whether it’s important, and I’m surprised that no-one else bothered to mention it as interannual variability is probably rather less sensitive than autocorrelation is to the detrending choices. Maybe I should have written a comment myself
In their reply to the comments. Cox et al now argue that their use of a simple detrending means that their analysis includes a useful influence of some of the 20th century forced response, which "sharpens" the relationship with sensitivity (which is weaker when the CMIP control runs are used). This seems a bit weak to me and as well as basically contradicting their original hypothesis, also breaks one of the fundamental principles of emergent constraints, that they should have a clear physical foundation. At the end of the discussion I’m more convinced that the windowing and detrending is a case of "researcher degrees of freedom" ie post-hoc choices that render the statistical analysis formally invalid. It’s an interesting hypothesis rather than a result.
The real test will be applying the Cox et al analysis to the CMIP6 models, although even in this case intergenerational similarity makes this a weaker challenge than would be ideal. I wonder if we have any bets as to what the results will be?