At long last jules and I have managed to submit the written version of a talk that I have given (bits of) no fewer than four times over the last few years (at NCAR, UKMO/HC, Schloss Ringberg and EGU). It had to get re-written several times and sit at the back of my mind on long bike rides and runs before it became acceptably coherent (to us, at least). I’m curious as to what people will think of it – it seemed to go down ok at the talks but sometimes it’s hard to tell…

The topic is "independence" as it pertains to both the understanding of ensembles such as CMIPn, and also to constraints on climate sensitivity. Our main point overall is that if you want to talk about independence in any context, you really need to present a mathematical/statistical formalisation that relates directly to the standard probabilistic definition: events A and B are independent iff P(A,B) = P(A)P(B). That is, the probability of both A and B is the probability of A multiplied by the probability of B. This generalises to conditional independence: events A and B are conditionally independent given S iff P(A,B|S) = P(A|S)P(B|S). A more practically useful (but mathematically equivalent) formulation is that events A and B are conditionally independent given S iff P(A|S) = P(A|B,S) – that is, if the conditional probability of A given both S and B is the same as the conditional probability of A given S. What this means in practice to an individual researcher is that, starting from their (probabilistic) prediction of A given knowledge of S, conditional independence of A and B rests on whether knowledge of B, in addition to knowing S, does or does not change their prediction of A. While this is no more than elementary probability theory, it seems to be an intuitively attractive way of addressing the question, eg in the case where A and B are observational constraints on the equilibrium climate sensitivity S. A point we also make in the paper is that these P()s are fundamentally subjective things just as much as a Bayesian prior is – there is no way of validating what observations should be seen in the case where S takes a value different from the real world, this counterfactual can only really exist in our heads and not in reality. In practice we often use models for this, which are themselves subjective creations, and in the paper we present an example to show how independence and non-independence of constraints can be investigated in the context of a toy model.

As for the question of model independence, the notation may be easier to interpret if we change the symbols and write something like P(M1|T) = P(M1|M2,T). This equation asserts that the models M1 and M2 are conditionally independent given the truth T. This is essentially the foundation of the truth-centred approach, and it would be great if true, but clearly many analyses of the models in CMIP ensembles have shown that it is not reasonable. An alternative conditional independence formulation, which we think is more relevant and interesting, is whether models are independent,

*conditional on the distribution of models*. We illustrate how this does seem to encapsulate much of the discussion of model similarity, in that models from different research centres seem independent whereas pairs of models from the same research centres do not, according to a fairly straightforward analysis of model similarity.
It is quite possible – likely – that some others will be able to improve on how we’ve tried to define independence, but our point was really to argue that in principle we must use a mathematical foundation in order to make any meaningful progress – and also observe that mathematical definitions do exist which seem to match at least some real-world usage reasonably well.

## 12 comments:

I think the first comma on line 6 is redundant.

Well done on reading that far :-)

To be very slightly more constructive, in "Mi (where 1 < i < n indexes the different models)" should be "<=" not "<" I think.

Oh yes you're right that's a bit silly of me. Let's pretend it was a deliberate test to see if the reviewers read it all :-)

PS phew what a scorcher it is down south!

PS it's warmish her in Yorkshire too!

>"That is, if the researcher does not know how to use the additional information A in order to better predict B, then A and B are conditionally independent to that researcher. Thus, ignorance implies independence"

Not knowing how to use the information does not seem quite the same as ignorance: You might know how to use the information but have not yet done the work or you might have vague thoughts that there may be possible way(s) to use the information but you don't have a precise plan. This may be obvious enough without needing to be stated. Yet I wonder if "ignorance implies independence" is too much of a potential error that it needs to be stated more carefully. Perhaps something more like: Lack of a way to test if the information may be useful implies independence. (Maybe only once then indicate 'ignorance is independence' is only a paraphrase of this.)

Is 'lack of a way to test if the information may be useful implies independence' more correct or just too pedantic or ...?

.

"We do not believe that a coincidentally similar behaviour should be penalised by

downweighting of these models, as it may represent a true ‘emergent constraint’ on system behaviour."

seems like an important finding/consideration to emphasise but perhaps that is just me.

I am wondering if there should be more discussion of the factors and their strength leading you to conclude that CNRM and one of the GFDL models should be considered independent despite the similarity shown by your tests.

I suspect your conclusion is likely to be sound with only support in the literature for the prior, it would take a lot of evidence to overturn that view. Should there be discussion of this and perhaps other factors?

I am not sure if it would be possible to consider the situation with 18 independent models and no emergent constraints. With 153 possible pairs, how many would show unusual similarities by chance? Can this be compared to what you found? Presumably you would find more similarities but we wouldn't know whether this was due to emergent constraints or similar algorithms implemented. So even if possible, does this get us nowhere, or is it interesting to know whether the situation is should find 7 actually found 30 or should find 7 actually find 10? If it was the latter, I would be more inclined to believe the 18 models are independent and any similarities are chance or emergent constraints.

Maybe that is impossible or too much work or... so maybe there is a different approach: 25 models 300 pairs. If you didn't know which was which so simply selected n pairs which are most similar. How big does n have to be to successfully find 8 or 9 of the pairs that are expected to be too similar. I think I am trying to ask if there is a big jump in the level of differences and does this tell us anything? I am thinking a big jump in the level of differences would tend to make us more confident that the 18 different centres are independent and same modelling centres ones are not? I guess there is some contamination from the level of emergent constraints, but just might still be interesting to see what numbers come out?

Thats a couple of exercises. Ideally if the confidence that comes out of different methods is contaminated by level of emergent constraints in different ways maybe that could be useful?

Thanks for the interesting comments. Was busy at this cunningly-acronymed meeting until recently. It is curious how I managed to write and re-write this for literally years, finally get it off my desk and immediately think how I could have written some things better (even excluding typos)! Eg one thing I probably should have emphasised is that all bayesian probabilities are conditional on the background knowledge of the researcher, ie for P(A) we should always read P(A|Omega) for some Omega, even if it is not immediately obvious whether or how Omega is informative. There is no such thing as pure ignorance.

I do suspect that saying ignorance implying independence is likely to raise some hackles. I am trying to promote a fundamentally subjective approach though, really as much to see what happens as anything else. We have to take responsibility for our probabilities and that also means taking responsibility for our judgements of independence. OTOH it may be a bit of weakness of the Bayesian paradigm that we need perfect computational abilities in order to be coherent. There isn't much room for "I ought to change my probabiities but I don't yet know what to change them to".

Non-independent models can be very dependent or hardly dependent, but there still needs to be some qualitative threshold at which they are truly independent (conditional on whatever explicit knowledge we add to our own background Omegas...).

Fundamentally, my consideration of CNRM and GFDL was based on the assumption that I wouldn't have predicted similarities prior to looking at the output. It really needs tested with another generation, ie for which pairs do I actually predict the output of the second model better, based on the first? If I knew that they used similar sub-models or even that they used similar tuning procedures, I might make a different prediction...but on the other hand, maybe it was just luck that the fields I looked at happened to agree, I don't think Masson and Knutti said they were particularly similar by their metric.

Thank you for the reply, maybe it will be interesting to watch out for those hackles.

Interesting POV. Eli's prior is that v2 of a model would be dependent on v1. That could be an interesting test of your idea, comparing two consecutive versions of a model with a pair formed by one of them with a number of other models.

I am wondering what part of James's " It really needs tested with another generation, ie for which pairs do I actually predict the output of the second model better, based on the first?"

the bunny missed or didn't understand or is he just getting into repeating things in his own words?

:-)

Actually there's a bit of a conflation of two things there. In some cases two gens are already in the same cmip ensemble, eg csiro 3.0 and 3.5. But the assessment can certainly be made across as well as within cmip iterations.

Post a Comment