Saturday, July 15, 2006

More on detection, attribution and estimation 1: Confidence intervals and credible intervals

I wrote this some time ago, and have been occasionally looking into the D&A stuff since then. It all seems rather a lot murkier than I had expected...this will take a lot of writing to explain so I'm splitting it into parts. This part is an introduction to the distinct concepts of confidence intervals and credible intervals.

To recap, D&A is an essentially frequentist procedure, which seeks to determine whether the observational record is "statistically inconsistent" with what we could have expected in the absence of anthropogenic influence, and "not statistically inconsistent" with what we think the anthropogenic influence should have been.

It is fundamentally a frequentist approach (comparing the real data to the population of model outputs which differ due to natural variability). Therefore, it has no direct interpretation as a probabilistic estimate of the magnitude or existence of the anthropogenic influence. Such estimates are absolutely incompatible with the frequentist paradigm. This is the fundamental message of these posts, and I can't stress it too fully.

Unfortunately, this distinction has been badly blurred in some of the literature, including even the IPCC TAR itself. In fact, it seems like there is rather widespread confusion between a (frequentist) Confidence Interval, and a (Bayesian) Credible Interval, so I will expand more on this now. A confidence interval (warning: page may be a bit dodgy, precisely due to the confusion I'm discussing) for a parameter x is an interval constructed according to a specific method, such that if we were to repeat the experiment numerous times, with a new set of observational data (with different random errors) for each experiment, then p% of the confidence intervals we construct using this method would contain the true (fixed) value of x, whatever that is.

This is not the same thing as an interval such that we believe x lies in the interval with probability p! For a frequentist, to make a probabilistic statement about x is to commit a category error: x is a fixed but unknown parameter, it is either in the specific interval or not.

A credible interval (which can also be abbreviated to CI: how confusing) is an inherently Bayesian concept: it is an interval such that the parameter is believed to lie in the interval with probability p. Fundamentally, the belief (probability) attaches to the person who makes the statement, rather than the parameter itself - in other words, it is subjective. In cases where many people agree on a particular credible interval (ie because they share similar judgements about priors, methods and data), it is sometimes called intersubjective. Calling a credible interval objective is potentially rather misleading IMO - it may be taken as implying that there is a true probability that we may be able to discover through careful analysis (analogous to the probability of 4 heads in 5 coin tosses, say). However, the only "true" probability that could apply in this sense is 0 or 1 - the event is either going to happen, or not (NB "going to" can apply to past events which are currently unknown to me, such as the probability that it rained in Birmingham on this day last year). The probability here applies to our belief in the truth of an unknown proposition, not any frequentist limit as the number of replications increases. Different people may quite reasonably differ in their opinion, without any one of them being wrong!

It seems quite clear from a bit of web-surfing (and a literature trawl) that the vast majority of people automatically and intuitively interpret confidence intervals as credible intervals: that is not so surprising, as the precise definition of the confidence interval is rather complex, counterintuitive, and not very useful in real life (in contrast, what people actually want to know is well-encapsulated by the notion of a credible interval). Moreover, even authoritative literature specifically on the subject of statistics frequently gets it wrong (as I'll show later). However, the two concepts are not the same, and it is trivial to construct confidence intervals that are in no way credible. I'll give some examples of this in part 2.

4 comments:

Anonymous said...

So is this your latest jihad? I guess blogging & yet another tirade is more fun for you than actually doing any original work.

It's amusing you'll give (at worst) mild support to Mann et al, but you feel the need to go off and now bash detection & attribution studies. All because, what, you can't join the party? James Quixote & Sancho Hargreaves ride again! ;-)

James Annan said...

Since these matters are in fact intimately related to my work, it is important to get these things right. Writing them down helps to clarify my thoughts, and who knows, maybe it will help other people too. In fact these posts draw on an informal internal seminar I gave a few weeks ago.

I'm puzzled as to where you get "give (at worst) mild support to Mann" from - I have barely commented at all on his work. And why do you act so threatened by what I have to say? If this is a "tirade", you have a thin skin indeed.

Hank Roberts said...

Chuckle.

You said: "even authoritative literature specifically on the subject of statistics frequently gets it wrong (as I'll show later)."

You realize this statement will be taken as an attack by those who only recently learned to rely on statisticians in the fight against paleoclimatologists.

James Annan said...

ankh,

I just provide the tea-leaves, the reading and interpretation is left to to others :-)