## Tuesday, April 22, 2014

### Objective probability or automatic nonsense?

A follow-up to the previous probability post.

Perhaps this will provide a clearer demonstration of the limitations of Nic's method. In his post, he conveniently provided a simple worked example, which in his view demonstrates how well his method works. A big advantage of using his example is that hopefully no-one can argue I've misapplied his method :-) This is Figure 2 from his post:

This example is based on carbon-14 dating, about which I know very little, but hopefully enough to explain what is going on. The x-axis in the above is real age with 0 corresponding to the "present day", which I think is generally defined as 1950 (so papers don't need to be continually reparsed as time passes). The y-axis is "carbon age" which is basically a measure of the C14 content of something under investigation, typically something organic (plant or animal). The basic idea is that the plant or aminal took up C14 as it grew, but this C14 slowly decays so the proportion in the sample declines after death according to the C14 half-life. So in principle you would think that the age (at death) can be determined directly from measurement of the proportion of carbon that is C14. However, the proportion of C14 in the original organism depends on the ambient concentration of C14 which has varied significantly in the past (it's created by cosmic rays and the like), so there's quite a complicated calibration curve. The black line in the above is a simplified and stylised version of what a curve could look like (Nic's post also has a real calibration curve, but this example is clearer to work with).

So in the example above, the red gaussian represents a measurement of radiocarbon which represents a "carbon age" of about 1000y, with some uncertainty. This is mapped via the calibration curve into a real age distribution on the x-axis, and Nic has provided two worked examples using a uniform prior and his favoured Jeffreys prior.

As some of you may recall, I lived in Japan until recently. Quite by chance, my home town of Kamakura was the capital of Japan for a brief period roughly 7-800y ago. Lots of temples date from that time, and there are numerous wooden artefacts which are well-dated to the Kamakura Era (let's assume, carved out of conteporaneous wood, though of course wood is generally a bit older than the date of the tree felling). Let's see what happens when we try to carbon-date some of these artefacts using Nic's method.

Well, one thing that Nic's method will say with certainty is "this is not a Kamakura-era artefact"! The example above is a plausible outcome, with the carbon age of 1000y covering the entire Kamakura era. Nic's posterior (green solid curve) is flatlining along the axis over the range 650-900y, meaning zero probability for this whole range. The obvious reason for this is that his prior (dashed line) is also flatlining here, making it essentially impossible for any evidence, no matter how strong, to overturn the prior presumption that the age is not in this range.

It is important to recognise that the problem here is not with the actual measurement itself. In fact the measurement shown in the figure indicates very high likelihood (in the Bayesian sense) of the Kamakura era. The problem is entirely in Nic's prior, which ruled out this time interval even before the measurement was made - just because he knew that a measurement of carbon age was going to be made!

Nic uses the emotionally appealing terminology of "objective probability" for this method. I don't blame him for this (he didn't invent it) but I do wonder whether many people have been seduced by the language without understanding what it actually does. You can see Richard Tol insisting that the Jeffreys prior is "truly uninformative" in a comment on my previous post, for example. Well, that might be true, but only if you define "uninformative" in a technical sense not equivalent to common english usage. If you then use it in public, including among scientists who are not well versed in this stuff, then people are going to get badly misled. Frame and Allen went down this rabbit hole a few years ago, I'm not sure if they ever came out. It seems to work for many as an anchoring point, when you discuss in detail, they acknowledge that yes, it's not really "uninformative" or "ignorant" really, but then they quickly revert back to this usage, and the caveats somehow get lost.

I propose that it would be better to use the term "automatic" rather than "objective". What Nic is presenting is an automatic way of generating probabilities, though it remains questionable (to put it mildly) whether they are of any value. Nic's method insists that no trace remains of the Kamakura era, and I don't see any point in a probabilistic method that generates such obvious nonsense.

## Friday, April 18, 2014

### [jules' pics] Sheep

Sheep being a signpost

Sheep having a special feast

Sheep laughing at the stupid mountain tandemmers

Sheep being Zen rocks

These sheep look a bit funny to me. Lots of them appeared in the fields just this week. Maybe they are a special Easter variety.

--
Posted By Blogger to jules' pics at 4/18/2014 05:43:00 PM

### Coverage

Or, why Nic Lewis is wrong.

Long time no post, but I've been thinking recently about climate sensitivity (about which more soon) and was provoked into writing something by this post, in which Nic Lewis sings the praises of so-called "objective Bayesian" methods.

Firstly, I'd like to acknowledge that Nic has made a significant contribution to research on climate sensitivity, both through identifying a number of errors in the work of others (eg here, here and most recently here) and through his own contributions in the literature and elsewhere. Nevertheless, I think that what he writes about so-called "objective" priors and Bayesian methods is deeply misleading. No prior can encapsulate no knowledge, and underneath the use of these bold claims there is always a much more mealy-mouthed explanation in terms of a prior having "minimal" influence, and then you need to have a look at what "minimal" really means, and so on. Well, such a prior may or may not be a good thing, but it is certainly not what I understand "no information" to mean. I suggest that "automatic" is a less emotive term than "objective" and would be less likely to mislead people as to what is really going on. Nic is suggesting ways of automatically choosing a prior, which may or may not have useful properties.
[As a somewhat unrelated aside, it seems strange to me that the authors of the corrigendum here concerning a detail of the method, do not also correct their erroneous claims concerning "ignorant" priors. It's one thing to let errors lie in earlier work - no-one goes back and corrects minor details routinely - but it is unfortunate that when actually writing a correction about something they state does not substantially affect their results, they didn't take the opportunity to also correct a horrible error that has seriously mislead much of the climate science community and which continues to undermine much work in this area. I'm left with the uncomfortable conclusion that they still don't accept that this aspect of the work was actually in error, despite my paper which they are apparently trying to ignore rather than respond to. But I'm digressing.]

All this stuff about "objective priors" is just rhetoric - the term simply does not mean what a lay-person might expect (including a climate scientist not well-versed in statistical methodology). The posterior P(S|O) is equal to to the (normalised) product of prior and likelihood - it makes no more sense to speak of a prior not influencing the posterior, as it does to talk of the width of a rectangle not influencing its area (= width x height). Attempts to get round this by then footnoting a vaguer "minimal effect, relative to the data" are just shifting the pea around under the thimble.

In his blog post, Nic also extolls the virtue of probabilistic coverage as a way of evaluating methods. This initially sounds very attractive - the idea being that your 95% intervals should include reality, 95% of the time (and similarly for other intervals). There is however a devil in the detail here, because such a probabilistic evaluation implies some sort of (infinitely) repeated sampling, and it's critical to consider what is being sampled, and how. If you consider only a perfect repetition in which both the unknown parameter(s) and the uncertain observational error(s) take precisely the same values, then any deterministic algorithm will return the same answer, so the coverage in this case will be either 100% or 0%! Instead of this, Nic considers repetition in which the parameter is fixed and the uncertain observations are repeated. Perfect coverage in this case sounds attractive, but it's trivial to think of examples where it is simply wrong, as I'll now present.

Let's assume Alice picks a parameter S (we'll consider her sampling distribution in a minute) and conceals it from Bob. Alice also samples an "error" e from the simple Gaussian N(0,1). Alice provides the sum O=S+e to Bob, who knows the sampling distribution for e. What should Bob infer about S? Frequentists have a simple answer that does not depend on any prior belief about S - their 95% confidence interval will be (S-2e,S+2e) (yes I'm approximating negligibly throughout the post). This has probabilistically perfect coverage if S is held fixed and e is repeatedly sampled. Note that even this approach, which basically every scientist and statistician in the world will agree is the correct answer to the situation as stated, does not have perfect coverage if instead e is held fixed and S is repeatedly sampled! In this case, coverage will be 100% or 0%, regardless of the sampling distribution of S. But never mind about that.

As for Bayesians, well they need a prior on S. One obvious choice is a uniform prior and this will basically give the same answer as the frequentist approach. But now let's consider the case that Alice picks S from the standard Normal N(0,1), and tells Bob that she is doing so. The frequentist interval still works here (i.e., ignoring this prior information about S), but Bayesian Bob can do "better", in the sense of generating a shorter interval. Using the prior N(0,1) - which I assert is the only prior anyone could reasonably use - his Bayesian posterior estimate for S is the Normal N(O/2,0.7), giving a 95% probability interval of (O/2-1.4,O/2+1.4). It is easy to see that for a fixed S, and repeated observational errors e, Bob will systematically shrink his central estimates towards the prior mean 0, relative to the true value of S. Let's say S=2, then (over a set of repeated observations) Bob's posterior estimates will be centred on 1 (since the mean of all the samples of e is 0) and far more than 5% of his 95% intervals (including the full 27% of cases where e is more negative than -0.6) will fail to include the true value of S. Conversely, if S=0, then far too many of Bob's 95% intervals will include S. In particular, all cases where e lies in (-2.8,2.8) - which is about 99.5% of them - will generate posteriors that include 0. So coverage - or probability matching, as Nic calls it - varies from far too generous, when S is close to 0, to far too rare, for extreme values of S.

I don't think that any rational Bayesian could possibly disagree with Bob's analysis here. I challenge Nic to present any other approach, based on "objective" priors or anything else, and defend it as a plausible alternative to the above. Or else, I hope he will accept that probability matching is simply not (always) a valid measure of performance. These Bayesian intervals are unambiguously and indisputably the correct answer in the situation as described, and yet they do not provide the correct coverage conditional on a fixed value for S

Just to be absolutely clear in summarising this - I believe Bayesian Bob is providing the only acceptable answer given the information as provided in this situation. No rational person could support a different belief about S, and therefore any alternative algorithm or answer is simply wrong. Bob's method does not provide matching probabilities, for a fixed S and repeated observations. Nothing in this paragraph is open to debate.

Therefore, I conclude that matching probabilities (in this sense, i.e. repeated sampling of obs for a fixed parameter) is not an appropriate test or desirable condition in general. There may be cases where it's a good thing, but this would have to be argued for explicitly.

# BlueSkiesResearch.org.uk

## Thursday, April 03, 2014

### [jules' pics] great spotted peanut-eater

Otherwise known as a greater spotted woodpecker... I suppose that she is probably sparrowhawk proof.

--
Posted By Blogger to jules' pics at 4/03/2014 08:24:00 PM

## Wednesday, April 02, 2014

### Another journal editor resigns!

Regular readers will have noticed that I follow the goings-on at EGU journals with some interest. So in that vein I'd like to point out there have been some recent changes at GMD. Perhaps most notably, our Dear Leader Dan Lunt has stepped down from the position of Chief Executive Editor, which he has held since the journal's inception about 6 years ago. Jules is the incoming chief. (Chief doesn't actually have any extra powers that I'm aware of, but is expected and trusted to take the lead on many decisions with or without discussion.) Bob Marsh has been added to the list of execs - this happened last year actually - having been a topical editor for some time. And...drum roll...I am no longer on the list of execs, though I'll remain a topical editor. All the execs feel that the journal (indeed all EGU journals) should be regarded as community assets rather than personal fiefdoms. So although it made sense to stick with a core team who shared a clear vision though the early years, we realised some time ago that it was time to bring in new ideas and let things evolve a bit. This feeling has been informally formalised though a rough plan to swap execs off the board on a biannual biennial every two year basis - Bob's induction was the start of this, staggered with my resignation to allow a bit of settling in time - and also rotate the chief exec position among the board members. I'm happy to leave the journal in the capable hands of the new board.

Incidentally, it is rumoured that the new Impact Factor for the journal will be approaching 6, up from 5 last year. That should put us even closer to the top of the list for journals in the geosciences! I'm sure that GMD, and all the other EGU journals, will continue to go from strength to strength as the open access movement continues to gain momentum.