Thursday, June 24, 2010

When is the mean better than all models?

Well Belette is acting like a stoat with a bone (at least like a rat with a bone, I don't really know much about stoat behaviour) on the question that he considers most important: when is the multi-model mean better than all individual models?

Clearly, this is not amenable to a simple proof like the previous question. It is easy to generate situations where the mean is best, and also situations where it it not. Thus, I find it less interesting :-) We looked at it in probabilistic terms - what is the probability that a particular model is better than the mean?

Here's a plot of the results. Focus on the solid dark blue line for starters.

"Perfect" in the legend refers to the concept of models and observations being drawn from the same distribution. The solid blue line shows the probability of a sample from the standard multivariate d-dimensional Normal N(0,1)d being closer to the observations (which are also a sample from N(0,1)d) than the mean of the sampling distribution 0 = (0)d is. The main point is that this probability depends strongly on the value of d, which I think is intuitively quite obvious. For each dimension (= Gaussian deviate), the mean squared error of the ensemble mean is 1 and the mean squared error of a random model is 2. So the more variables you have, the larger the gap in the expected sum of these terms and the less chance that random variation will result in the former sum actually exceeding the latter for a given random sample.

The two other dark blue lines arise from different shapes of sampling distributions, where instead of being perfectly isotropic in the metric space, the variance is divided unequally across the dimensions. One case is geometric, N(0,pi) for some p less than 1 and another is a square root variation N(0,sqrt(i)) where i indexes the individual deviates. In this case we need the concept of "effective" dimension which we copied from another paper (Bretherton et al). The conclusion is that the shape of the distribution doesn't matter much compared to the effective dimension. Red and cyan results correspond to the cases that the truth is sampled from a distribution which is either narrower or broader than the models. Initially it may be counterintuitive that a "bad" ensemble (where the truth is miles away) has more relatively "good" models (better than the mean) but you can show this easily just by drawing a quick sketch - draw a rough oval for the ensemble, and consider the truth as being either close to the mean or miles away to one side. In each case, whether a model is better or worse than the mean is demarcated by a circle centred on the truth and passing through the ensemble mean. Any model inside the circle is better than the ensemble mean. In the case of the truth being miles away from the ensemble, this circle will contain a larger proportion of the ensemble volume - up to a limit of 50% in the case of very distant truth.

The remaining piece in the puzzle is to consider the effective dimension of the distribution in the actual case under consideration - the CMIP3 models and the climate system. Is it low, in which case we should have many models better than the mean, or high, in which case we shouldn't find (m)any at all? And can we say anything about whether the ensemble is relatively broad (meaning few models better than the ensemble mean) or too narrow (lots of them)? Stay tuned for the next exciting instalment...I'm so excited I can hardly sleep.


crandles said...

And it is so exciting because you have already concluded

"we find that the CMIP3 ensemble generally provides a rather good sample under the statistically indistinguishable paradigm, although it appears marginally over-dispersive and exhibits some modest biases"

Now I wonder whether a different conclusion would be exciting, or make you less inclined to blog about it, or was there some sarcasm about excitment? I think I will go with guessing that it is a very similar or identical conclusion.

Distinguishing between overdispersive because of too much weather noise or overdispersive because the trends have too great a variation seems important. I am wondering if comparing MMM to initial condition ensemble means might show different results than comparing MMM to a single individual ensemble members. Does the comparison of both provide additional useful information? Or am I asking silly questions?

James Annan said...

I only really blogged it because people kept should be interesting to those working in the area but not of any earth-shattering significance.

EliRabett said...

You will never be the next Steve McIntyre with that attitude

David B. Benson said...

Please don't shatter the earth.

Its the only one we have...

Carrick said...

It seems to me that the MMM will be better than any of the individual models only when it is not "nearly" centered on truth (I'm thinking the criterion for "nearly" should be a derivable quantity using the framework you've provided). Center it on truth, and you are bound to have some individual models that are better than the MMM.

Doesn't that follow from your derived result that:

"the squared distance from obs to multi-model mean is less than the mean of the squared distances from the individual models to the obs, by an amount which equals the average of the squared distances from the models to their mean?"

James Annan said...

No, the MMM is likely to be better than all models when the ensemble is truth-centred (or close to it), because the models will be around one standard deviation from the truth but the MMM will be substantially closer. That's what the red lines in the plot show. It also depends on the number of dimensions though.

Carrick said...

James You're absolutely right... if the MMM is exactly centered on the truth, by definition it will to do as well or better than any of the individual models. Oops.

William M. Connolley said...

"Stoat with a rabbit's throat", surely. But thank you for doing this. As far as I can see it looks entirely plausible, and the results seem to fit what one finds for low-dimensions.

I actually think this is more interesting than the previous result so I hope you'll pad out your paper with is.

James Annan said...

Oh this (and associated analysis of the IPCC models) makes up the bulk of the manuscript - I didn't think the first result was enough to publish by itself, especially as I thought at the time it was pretty widely known.