Or, why Nic Lewis is wrong.

Long time no post, but I've been thinking recently about climate sensitivity (about which more soon) and was provoked into writing something by

this post, in which Nic Lewis sings the praises of so-called "objective Bayesian" methods.

Firstly, I'd like to acknowledge that Nic has made a significant contribution to research on climate sensitivity, both through identifying a number of errors in the work of others (eg

here,

here and most recently

here) and through his own contributions in the literature and elsewhere. Nevertheless, I think that what he writes about so-called "objective" priors and Bayesian methods is deeply misleading. No prior can encapsulate no knowledge, and underneath the use of these bold claims there is always a much more mealy-mouthed explanation in terms of a prior having "minimal" influence, and then you need to have a look at what "minimal" really means, and so on. Well, such a prior may or may not be a good thing, but it is certainly not what I understand "no information" to mean. I suggest that "automatic" is a less emotive term than "objective" and would be less likely to mislead people as to what is really going on. Nic is suggesting ways of automatically choosing a prior, which may or may not have useful properties.

[As a somewhat unrelated aside, it seems strange to me that the authors of the corrigendum

here concerning a detail of the method, do not also correct their erroneous claims concerning "ignorant" priors. It's one thing to let errors lie in earlier work - no-one goes back and corrects minor details routinely - but it is unfortunate that when actually writing a correction about something they state does not substantially affect their results, they didn't take the opportunity to also correct a horrible error that has seriously mislead much of the climate science community and which continues to undermine much work in this area. I'm left with the uncomfortable conclusion that they still don't accept that this aspect of the work was actually in error, despite

my paper which they are apparently trying to ignore rather than respond to. But I'm digressing.]

All this stuff about "objective priors" is just rhetoric - the term simply does not mean what a lay-person might expect (including a climate scientist not well-versed in statistical methodology). The posterior P(S|O) is equal to to the (normalised) product of prior and likelihood - it makes no more sense to speak of a prior not influencing the posterior, as it does to talk of the width of a rectangle not influencing its area (= width x height). Attempts to get round this by then footnoting a vaguer "minimal effect, relative to the data" are just shifting the pea around under the thimble.

In his blog post, Nic also extolls the virtue of probabilistic coverage as a way of evaluating methods. This initially sounds very attractive - the idea being that your 95% intervals should include reality, 95% of the time (and similarly for other intervals). There is however a devil in the detail here, because such a probabilistic evaluation implies some sort of (infinitely) repeated sampling, and it's critical to consider what is being sampled, and how. If you consider only a perfect repetition in which both the unknown parameter(s) and the uncertain observational error(s) take precisely the same values, then any deterministic algorithm will return the same answer, so the coverage in this case will be either 100% or 0%! Instead of this, Nic considers repetition in which the parameter is fixed and the uncertain observations are repeated. Perfect coverage in this case sounds attractive, but it's trivial to think of examples where it is simply wrong, as I'll now present.

Let's assume Alice picks a parameter S (we'll consider her sampling distribution in a minute) and conceals it from Bob. Alice also samples an "error" e from the simple Gaussian N(0,1). Alice provides the sum O=S+e to Bob, who knows the sampling distribution for e. What should Bob infer about S? Frequentists have a simple answer that does not depend on any prior belief about S - their 95% confidence interval will be (S-2e,S+2e) (yes I'm approximating negligibly throughout the post). This has probabilistically perfect coverage if S is held fixed and e is repeatedly sampled. Note that even this approach, which basically every scientist and statistician in the world will agree is the correct answer to the situation as stated, does not have perfect coverage if instead e is held fixed and S is repeatedly sampled! In this case, coverage will be 100% or 0%, regardless of the sampling distribution of S. But never mind about that.

As for Bayesians, well they need a prior on S. One obvious choice is a uniform prior and this will basically give the same answer as the frequentist approach. But now let's consider the case that Alice picks S from the standard Normal N(0,1), and tells Bob that she is doing so. The frequentist interval still works here (i.e., ignoring this prior information about S), but Bayesian Bob can do "better", in the sense of generating a shorter interval. Using the prior N(0,1) - which I assert is the only prior anyone could reasonably use - his Bayesian posterior estimate for S is the Normal N(O/2,0.7), giving a 95% probability interval of (O/2-1.4,O/2+1.4). It is easy to see that for a fixed S, and repeated observational errors e, Bob will systematically shrink his central estimates towards the prior mean 0, relative to the true value of S. Let's say S=2, then (over a set of repeated observations) Bob's posterior estimates will be centred on 1 (since the mean of all the samples of e is 0) and far more than 5% of his 95% intervals (including the full 27% of cases where e is more negative than -0.6) will fail to include the true value of S. Conversely, if S=0, then far too many of Bob's 95% intervals will include S. In particular, all cases where e lies in (-2.8,2.8) - which is about 99.5% of them - will generate posteriors that include 0. So coverage - or probability matching, as Nic calls it - varies from far too generous, when S is close to 0, to far too rare, for extreme values of S.

I don't think that any rational Bayesian could possibly disagree with Bob's analysis here. I challenge Nic to present any other approach, based on "objective" priors or anything else, and defend it as a plausible alternative to the above. Or else, I hope he will accept that probability matching is simply not (always) a valid measure of performance. These Bayesian intervals are unambiguously and indisputably the correct answer in the situation as described, and yet they do not provide the correct coverage conditional on a fixed value for S

Just to be absolutely clear in summarising this - I believe Bayesian Bob is providing the only acceptable answer given the information as provided in this situation. No rational person could support a different belief about S, and therefore any alternative algorithm or answer is simply wrong. Bob's method does not provide matching probabilities, for a fixed S and repeated observations. Nothing in this paragraph is open to debate.

Therefore, I conclude that matching probabilities (in this sense, i.e. repeated sampling of obs for a fixed parameter) is not an appropriate test or desirable condition in general. There may be cases where it's a good thing, but this would have to be argued for explicitly.