Comments on James' Empty Blog: How not to compare models to data again

I think I am not understanding. In 2a I think see ...

2012-03-31T20:38:15.890+01:00

I think I am not understanding. In 2a I think see 1 triangle below and right of dashed lines. But perhaps you are right and triangles are plotted so top apex is where the point is and that is above dashed line.

But either way on that, the horizontal dashed line is a quality control where 33% of models are rejected. AFAICS that doesn't make the remainder a 66% likely range, it just makes them the models that pass quality control test. A 66% likely range should AFAICS then be calculated using the good models. They clearly haven't done this - looks more like 99.5% range to me.

If they calculated a 66% range from good models, this would be expected to be smaller than IPCC likely range as they have only used 1 model type (HadCM3L) and using other model designs would give a wider range. I haven't seen any discussion equating a 99.5% range of good models of 1 model type to a 66% likely range of all models types. Seems a bit of a pluck a number out of the air to me. But I am probably just not understanding.

Chris, I think you are misunderstanding their meth...

2012-03-31T14:15:18.174+01:00

Chris, I think you are misunderstanding their method, which is fair enough as it seems pretty obscure and bizarre to me. From a Bayesian perspective, sure these things are rare enough that they have an extremely low probability, and will not lie in a sensible 66% range. But these guys are presenting the full range from their ensemble, after checking consistency with the historical record. If they happened to get a single model that warmed by 15C, they would use this as the uper bound on their 66% interval.

If they had applied this methodology to the original cpdn sensitivity paper, they would have had a 66% range going right up to 11C sensitivity, because they found a model up there that had a plausible climatology. Since that time, I believe they have generated models with even higher values, like 20C.

(That's assuming their climatology test was set at something they would describe as 66%, but the basic point stands whatever probability level it was labelled.)

This method does not make much sense to me, and it certainly does not appear to be compatible with the Bayesian perspective in which probability is interpreted as a degree of belief. OF course, the authors (at least some of them) have rejected the Bayesian approach in previous papers, but that rather leaves the question of what 66% actually does mean to them.

Given enough samples, quantum mechanics says earth...

2012-03-31T13:55:16.925+01:00

Given enough samples, quantum mechanics says earth can jump into middle of the sun but the average time you would have to wait for it to happen is an unbelievably high order of magnitudes longer than length of time universe has existed. So I don't think it is worth considering these extremely remote possibilities.

Back in land of sense, once a 15K global average temperature anomaly arises through natural variability then there is such a very strong feedback making reaching a 20K anomaly sufficiently unlikely that I think we should ignore it in same way as possibility of earth jumping into centre of sun.

This is probably silly enough that it isn't worth debating.

If natural variability is considered as a sample f...

2012-03-30T23:36:43.169+01:00

If natural variability is considered as a sample from a gaussian (and that's one of the most tame possibilities with very thin tails) then given enough samples, you can get a model to do literally anything.

>"under some very simple and natural assum...

2012-03-30T14:56:17.858+01:00

>"under some very simple and natural assumptions would grow without bound"

You might think such assumptions are simple and natural. I have difficulty believing there would be a model that shows over 20k temperature change by 2041-2060 that has a good hindcast goodness of fit unless there was some problem with the experiment design or an unstable computer calculating the forecast.

The question seems to be why isn't the 66% likely range calculated for each arbitary goodness of fit threshold?

Carrick, Yes some verification is needed but AFAI...

2012-03-30T12:22:17.696+01:00

Carrick,

Yes some verification is needed but AFAICS they have done this with their 'goodness of fit' measure. I don't really know how good the measure they have used is as it isn't easy to come up with a good single measure.

I would accept that models that have been tuned over long periods of time have probably had more examination on more aspects and thus CMIP models might have a greater tendancy to be good than the measure CPDN has used shows.

Re "model in a given run"
They have used initial condition ensembles where available. It would be interesting to look in more detail at that single model that has high transient temperature rise (nearly 3K for 2041-2060) as is as good as any other model per the goodness of fit measure. Was it a single run or an initial condition ensemble of how many runs? Does the output stand up to more scrutiny than just applying their goodness of fit measure?

Looking at the paper more carefully, I think it...

2012-03-30T02:07:17.344+01:00

Looking at the paper more carefully, I think it's worth a longer post. It seems that their "likely" range is simply the full range of outcomes from the sample that passes their statistical test (see the vertical lines exact match the extreme members in the "good" set) and their range is therefore heavily dependent on sample size, and under some very simple and natural assumptions would grow without bound as the ensemble grows (ie, does not asymptote to any finite value).

It is somewhat analogous to the first cpdn paper when they basically said "we found a high sensitivity model that is compatible with the current climate".

As such, I don't think there is any valid interpretation of their range, in terms of IPCC "likely" etc.

However, I might have completely misinterpreted what they did.

crandles: "Looking at figure 2a, the good mod...

2012-03-29T15:35:54.550+01:00

crandles: "Looking at figure 2a, the good models below the dotted line has a range that is not much narrower than the range for the bad models above the dotted line. "

Wouldn't you agree that you would need to look the model outputs that generated the high range values for sensitivity and see whether they are consistent with past warming?

It's not enough to say a model can in a given run generate a high sensitivity, that model must also meet some nominal verification statistics. Right?

>" It might have been interesting to addre...

2012-03-29T13:50:54.274+01:00

>" It might have been interesting to address the question of how much future warming is generated by the models which do not also overpredict the past warming."

Yes that might be interesting to see. However, ISTM they are asserting that their goodness of fit measure is better than just using 'overpredict past warming' (which is probably just a global average temperature?) because their goodness of fit also includes spatial patterns. Looking at figure 2a, the good models below the dotted line has a range that is not much narrower than the range for the bad models above the dotted line. If quality is improved a little by lowering the dotted line then the IPCC 4AR upper limit on temp change looks better with just one very good model at nearly 3C temp change.

The 66 percentile used seems an arbitary choice to make a headline drawing conclusion that the IPCC 4AR temp change range is too low at the high end. That it appears too low at the low end seems more robust.

I don't see anything in that paper to support ...

2012-03-29T05:46:20.791+01:00

I don't see anything in that paper to support the claim. It was already well known that it is possible to envisage models that match recent observations but which have a broader range of projections than the CMIP models. This Rowlands et al paper appears to merely be a variant on the same argument. It is precisely the point of my paper to illustrate why this fact cannot discredit the CMIP models.

Additionally, I don't understand the calculations underpinning this new paper. It looks from their Fig 1 that the high projections are coming from models that massively overpredicted the recent warming too. It might have been interesting to address the question of how much future warming is generated by the models which do not also overpredict the past warming.

I think I forgot to provide the link: http://www.n...

2012-03-28T18:11:30.601+01:00

I think I forgot to provide the link:
http://www.nature.com/ngeo/journal/vaop/ncurrent/full/ngeo1430.html

It is clear you do not like statements like "...

2012-03-28T14:37:59.721+01:00

It is clear you do not like statements like "the current AOGCMs may not cover the full range of uncertainty for climate sensitivity".

So I will be interested to hear your take on whether this sort of thing is now justified in light of

'Broad range of 2050 warming from an observationally constrained large climate model ensemble'
Rowlands et al 2012

Or whether you think there are major problems with that paper (whether along the lines expressed here or different problems).

I find this idea of declaring ensembles of models ...

2011-11-27T21:24:15.356+00:00

I find this idea of declaring ensembles of models with giant uncertainties to be "non-excludable" to be excruciatingly uninteresting. Let's invert the problem. How good does a model have to be on an annual basis to give us a 1C confidence interval in its prediction 100 yrs from now. That's actually interesting and testable.

Not too surprised at the PNAS thing either - howev...

2011-11-21T04:52:11.671+00:00

Not too surprised at the PNAS thing either - however their radiation map doesn't seem to mesh with the exposure we've been experiencing in Yamagata-ken, then again - they do say their program doesn't model regional Japanese well.

Chris, Well I wanted to make the link to the clim...

2011-11-20T09:49:09.862+00:00

Chris,

Well I wanted to make the link to the climate system as clear as possible. But your example is also useful in allowing us to consider the interplay between the two types of uncertainty. If asked for a prediction of the next throw, A and B may well both give probabilistic answers which are somewhat similar not only to each other but also may (with a bit of luck) be reasonably close to the "true" probability distribution of the die - there really is a "true" uncertainty here (the long-term frequency distribution of the die), which is relevant to the prediction. However, if asked for the mean of 10,000 throws, A and B will still give rather broad answers, but this time they will have a far greater spread than the intrinsic uncertainty of the experimental results - which isn't quite a delta function, but which would be pretty tight (eg if the 10,000 throws experiment was repeatedly performed). Climate sensitivity, being essentially determined by the long-term average over the attractor of a chaotic system which we don't understand very well, is essentially an example of the second case.

ac, that seems understandable to some extent, but ...

2011-11-20T09:34:33.583+00:00

ac, that seems understandable to some extent, but what would you say if someone else (perhaps even yourself in a few years time, performing a reanalysis with a better model) came up with a different system that also gave perfectly calibrated predictions but which differed from your "true" values?

As for "correct", well I'd certainly agree there is such a thing as a "correct" calculation according to Bayes but it depends on what you put in - including the prior, and there may be genuine disagreement there even if nowhere else. In practice, there is plenty of room for judgement in interpreting the available evidence too.

Nice argument. I'm not sure that the first err...

2011-11-20T00:09:30.493+00:00

Nice argument. I'm not sure that the first error is really an error.

In NWP land I use the term 'true' uncertainty to mean an uncertainty that yields well calibrated probability forecasts, i.e. neither over nor underconfident. It's not a property of me personally, but an output of a forecast system.

Of course you can't do this for climate sensitivity modelling, but I'd argue there is still a correct uncertainty that you obtain by optimally combining all the available evidence.

Hank said "When you say "uncertainty ......

2011-11-19T17:09:37.273+00:00

Hank said
"When you say "uncertainty ... is not intrinsic to the system being considered" "

Why the lengthy posts?

Consider there is a weighted dice, person A has seen 10 rolls averaging 4.5, person B has seen 8 rolls averaging 5.

The uncertainty of the system e.g an average of 10,000 rolls is very low and is not the uncertainty we are dealing with, which is that of person A, person B or perhaps of some omnipotent observer that has seen all the rolls of the dice. This uncertainty is clearly intrinsic to the person not the system.

For fun, and perhaps puts some light on the conclu...

2011-11-18T17:42:12.223+00:00

For fun, and perhaps puts some light on the conclusion that the uncertainty is not a function of the system:

"Hank - basically, the answer is that the uncertainty we are interested in is not actually a property of the earth at all.

Consider the following:

"Someone" creates a new planet. You can't examine the planet directly, but just see some of the data as it evoles its climate history (change). Your observations of the data get increasingly limited and inaccurate as you look further back in time.

The question you want to know the answer to is, what is the equilibrium sensitivity of the planet? This is a well-defined value - run the planet at 280 and 560ppm CO2 for a long time, and the temp difference will converge to a particular value, with essentially as much precision as you want. But you aren't allowed to actually do this experiment. (Alas, we're doing it.)

Your (sic - kind of prejudices the case, try "The") uncertainty in the planet's climate sensitivity is a property of you, not of the planet. The planet has a fixed value, you just don't know it. There is no "true" distribution of sensitivity (other than the delta-function at the correct value, for pedants). If you do a Bayesian estimation based on what the planet does over the 20th century or LGM etc, you have to start with a prior belief about the sensitivity, which may depend on your judgements about the "someone" who built the planet. Depending on your prior, and the particular properties of the planet, you will probably end up with a substantial uncertainty in its sensitivity - just as most people find when they do this with the real planet."

Yes, I deliberately glossed over the state/path-de...

2011-11-18T07:03:07.863+00:00

Yes, I deliberately glossed over the state/path-dependence issue and "real" stochasticity, this can indeed lead to a small uncertainty in the equilibrium achieved in a given experiment, but this uncertainty is very small indeed in model world, and I am sure that most scientists think it to be small in the real world too - so long as we are only talking about moderate differences in climate state, compatible with the present day climate.

Regarding the example of the true sensitivity dist...

2011-11-17T17:04:44.867+00:00

Regarding the example of the true sensitivity distribution of the black-box climate model: If the model were stochastic and path-dependent, it could be that different realizations would converge to different equilibrium temperatures for the same forcings. Therefore one could argue that its sensitivity does have some true distribution.

If this is also true of the real system, it seems that it would amplify the point of your original claim against truth-centrism ("First, since we don't know the truth (in the widest sense) we have no possible way of generating models that scatter evenly about it."). Even if we did know the truth, i.e. we had an exact replica of the black box model, up to stochastic inputs, and could generate an ensemble that reflected the true distribution of outcomes, there's no reason to think that 'reality' (a single realization of the original model) would lie at the center of that.

Um...not really too surprised by the PNAS thing. I...

2011-11-17T06:33:24.456+00:00

Um...not really too surprised by the PNAS thing. I don't think it is really that big news. AIUI the contamination is about where people thought it was, at about the level they thought it was.

Bit of an aside, but do you or Jules have any obse...

2011-11-17T03:16:55.095+00:00

Bit of an aside, but do you or Jules have any observations on the recent PNAS papers on soil contamination from Fukushima?

Refs: http://www.pnas.org/content/early/2011/11/09/1111724108.full.pdf

http://www.pnas.org/content/early/2011/11/11/1112058108.full.pdf

Eamon

Tom, this is exactly the argument I have tried to ...

2011-11-17T01:42:47.579+00:00

Tom, this is exactly the argument I have tried to make in writing, in this manuscript. However, I haven't worked out how to publish it (was rejected by GRL, no real surprise there).

I would certainly not claim to have proved that the ensemble is definitely broad enough - fundamentally, I think this is basically unprovable (and indeed false at some level of detail - eg the raw model predictions for hurricanes are generally wrong, cos they don't resolve them). But most of the claims to have shown that the model spread is grossly inadequate in large-scale average variables such as temperature trend are not well-founded IMO.

Hank - basically, the answer is that the uncertain...

2011-11-17T01:36:42.115+00:00

Hank - basically, the answer is that the uncertainty we are interested in is not actually a property of the system at all.

Consider the following:

"Someone" creates a new climate model and compiles it into an executable. You can't examine the code directly, but just see some of the outputs as it simulates the evolution of historical climate (change). Your observations of the outputs get increasingly limited and inaccurate as you look further back in time.

The question you want to know the answer to is, what is the equilibrium sensitivity of the model? This is a well-defined value - run the model at 280 and 560ppm CO2 for a long time, and the temp difference will converge to a particular value, with essentially as much precision as you want. But you aren't allowed to actually do this experiment.

Your uncertainty in the model's climate sensitivity is a property of you, not of the model. The model has a fixed value, you just don't know it. There is no "true" distribution of sensitivity (other than the delta-function at the correct value, for pedants). If you do a Bayesian estimation based on what the model does over the 20th century or LGM etc, you have to start with a prior belief about the sensitivity, which may depend on your judgements about the "someone" who built the model. Depending on your prior, and the particular properties of the model, you will probably end up with a substantial uncertainty in its sensitivity - just as most people find when they do this with the real world.