Friday, February 22, 2013

Yet more on uniform priors and the misinterpretation of p-values

From this (which is very relevant to climate science, where misinterpretation of p-values is entirely routine):
The default conclusion from a noninformative prior analysis will almost invariably put too much probability on extreme values. A vague prior distribution assigns much of its probability on values that are never going to be plausible, and this disturbs the posterior probabilities more than we tend to expect—something that we probably do not think about enough in our routine applications of standard statistical methods.
Of course, it's not something that will be news to readers of this blog...but it's a shame that this elite group seems to not include IPCC authors and their colleagues...

45 comments:

Magnus Westerstrand said...

Thanks for that link, imho a post about how this relates to climate in "laymen" terms would be nice.

David B. Benson said...

Still think you should try the unshifted Lévy distribution
http://en.wikipedia.org/wiki/L%C3%A9vy_distribution
as a only slightly informed prior for determining the ever popular Charney equilibrium climate sensitivity.

Naturally, of course, use
c/3 = 2.718281828459045

nic-lewis said...

"A vague prior distribution assigns much of its probability on values that are never going to be plausible, and this disturbs the posterior probabilities more than we tend to expect"

I disagree with Gelman on this. An objectivist Bayesian position is that a noninformative prior distribution has no real interpretation in probabilistic terms (see, e.g., Bernardo and Smith, 1994). It should rather be seen as a weight function that reflects how informative the data is about the parameters at varying parameter values (and it can also, if necessary, be data dependent). Of course, I don't expect a subjectivist Bayesian to agree, of course.

James Annan said...

Nic, do you disagree that U[0,20] assigns 70% probability to the region 6-20?

EliRabett said...

So if you already know the answer than Bayesian analysis is useful, and if not not?:)

Something tells Eli that is not your point but how do you express ignorance usefully in you prior?

James Annan said...

Eli,

Well you cannot directly express the concept of not knowing in the Bayesian paradigm. Any prior has a specific and precise probabilistic interpretation. However, you can use a range of priors to test the sensitivity of your result to the assumptions, and I'd generally recommend that approach.

For example, we might not have a firm opinion on what prior probability to assign to the sensitivity being greater than 6C. I assert that 70% is a pretty ridiculous value, however. My own proposal of Cauchy prior gives 18% for this which still seems rather high.

James Annan said...

Magnus, I'm not sure a whole post is justified, but if for example we parse the IPCC phrasing that the observed warming is "(very) unlikely to be due to natural causes alone" (which is directly based on D&A analyses) then it's hard to avoid the conclusion that it's a nonsensical misinterpretation of a p-value.

What they could have said, is that it is (very) unlikely that an unforced climate would have warmed this much. But what they actually said is that it's very unlikely that anthropogenic forcing has no effect...ie, 10% probability that it *does* have no effect. Which is bonkers.

crandles said...

>"you can use a range of priors"

>"My own proposal of Cauchy prior"

While it is pedantry to point out the change from plural to singular...

Isn't there a case for each researcher to suggest not only their preferred best estimate of a suitable distribution but also what they think are reasonable skewed high and skewed low and skewed wide (and narrow?) distributions.

To compare between different analysis methods, it would be useful if standard distributions to use emerged. Just because a set of standard distributions to use emerged would not mean that a researcher could not use his own preferred distribution(s) in addition.

Is that feasible or would the prior without the data in each analysis method need to be sufficiently different so that wouldn't work well?

crandles said...

>"it's hard to avoid the conclusion that it's a nonsensical misinterpretation of a p-value.
...
ie, 10% probability that it *does* have no effect."

I agree that any misinterpretation of a p-value is a bad example to set even if the conclusion is correct.

But if any misinterpretation is a bad example then it should also be pointed out that very likely means over 90% probability. Therefore they are only saying there is ***up to*** a 10% probability that anthropogenic forcing has no effect.

So the example is a correct conclusion, perhaps unnecessarily wide a range, and sets a bad example on misinterpreting p value.

(Nonsensical doesn't really fit that situation does it?)

Jonathan Jones said...

James, is there a confusion here between uniform priors and uniformative priors?

crandles said...

Jonathon, not here I hope. But there has been confusion among certain climate scientists that have indicated a strong preference for uniform priors because they are uninformed/ignorant.

James has written papers and I believe made some progress on showing uniform prior is not ignorant and the choice of upper bound (for climate sensitivity estimation) makes a difference to the views portrayed by the prior.

'Some progress' may not be as far as getting the IPCC report rewritten to remove all the misinterpretation of p-values statements.

nic-lewis said...

James
"Nic, do you disagree that U[0,20] assigns 70% probability to the region 6-20?"

I will assume, to simplify matters, that the U[0,20] range encompasses the range over which the observed likelihood function could be significant. On that basis, if the data corresponded to some linear function of a single parameter, measured with symmetrical random errors obeying some fixed statistical distribution (e.g., Gaussian), then a uniform [0,20] prior would convey no information as to the probability of the parameter value lying in the region [6,20]. Rather, it would be completely noninformative, in that the inferred posterior density for the parameter would be entirely determined by the data. In such a case, Bayesian parameter inference using a wide uniform prior would be identical to the frequentist inference, leaving aside the frequentist refusal to assign a probability distribution to a fixed but unknown parameter.

The reason why the use of a [0,20] uniform prior for climate sensitivity estimates in AR4 WG1 gave a strong bias towards very high sensitivity values is principally that the data involved bore highly nonlinear relationships to climate sensitivity, the parameter concerned, with the data being much less informative about high sensitivity values than low ones. Therefore, to be noninformative the prior must decline with increasing sensitivity.

In an instrumental-period study where climate sensitivity was the only parameter being estimated, such as Forster & Gregory (2006) and Gregory et al (2002), it corresponds to the ratio of global change in surface temperature to global change in radiative imbalance/ ocean heat uptake net of radiative forcing. Since the (approximately Gaussian) uncertainty in global changes in radiative imbalance/ ocean heat uptake net of radiative forcing is much larger than the uncertainty in global surface temperature change, that ratio - to which the reciprocal of climate sensitivity has a linear relationship - has nearly Gaussian errors. Therefore, a wide uniform prior in the reciprocal of climate sensitivity would be noninformative for its estimation from that ratio. That implies that, in such cases, a 1/Sensitivity^2 prior in climate sensitivity would be noninformative for estimating climate sensitivity. So the uniform-in-sensitivity prior on which the Forster & Gregory (2006) results were (re)stated in AR4 greatly increased the apparent probability of high climate sensitivity.

Why don't you ask Steve Jewson if he agrees with me on this, if you doubt my logic?

The Frame et al (2005) study for which you, quite rightly, objected to use of a U[0,20] prior in sensitivity is a more complex case, both because of the observation vs model-simulation comparisons "Bayesian" method used and because of the unusual asymmetrical, unstated, error distributions involved. But, clearly, the U[0,20] prior used was highly informative and the resulting 1.2 - 11.8 C 5-95% range for climate sensitivity greatly biased upwards at the upper end.

Magnus Westerstrand said...

Thanks James, however I am more after a description on how to decide what prior to use... with climate examples...

crandles said...

Magnus,

James paper where he works through undesirable properties of uniform priors and also reasonable priors is

http://www.jamstec.go.jp/frsgc/research/d5/jdannan/probrevised.pdf

see pages 9,10 and 11 in particular.

You asked for laymen terms but the paper seems quite readable to me.

blog posts on the topic include:

http://julesandjames.blogspot.co.uk/2006/03/comment-on-frame-et-al.html

I cannot find believe_grl.pdf (rejected) which may have had more explanation.

crandles said...

nic-lewis,

If you follow the 'comment on Frame et al' link I gave above, you will see that James wrote

"they use "no knowledge" to justify their choice of a uniform prior). In context, this could reasonably be re-written as (A) "what would our estimate of climate sensitivity be, if we had no data and knowledge other than that directly considered by this study?"

However, it seems clear to us that what users really want to know is (B) "what is our estimate of climate sensitivity, using all of our data and knowledge?"

The answer to question A will necessarily have greater uncertainty than the answer to question B. If someone wants to generate an estimate of climate sensitivity, they should use all of the data, either by explicitly considering it, or by the use of a prior which encapsulates (as accurately as possible) the information which the study doesn't directly look at!"

So the logic you are suggesting does not seem to be in doubt. The choice of view depends on whether you want

a) Information from the study's analysis only i.e. just the likelyhood function and nothing else. Or,

b) whether you want a realistic estimate of climate sensitivity.


a) is OK if that is what you want as long as you don't then go and present the result as a realistic estimate of climate sensitivity.

Most people are interested in what is a realistic estimate and it appears to me that you want to fall short of doing that.

Perhaps you would like to consider this situation: http://julesandjames.blogspot.co.uk/2012/11/that-xkcd-cartoon.html

Betting on end of the world doesn't make much sense. But would you prefer the answer from the frequentist or your own version of bayesianism or James style of Bayesian answer?

nic-lewis said...

crandles

I think what you are wanting is the result from a type of meta-analysis of more than one study. That is fine, and hopefully will provide a better estimate of climate sensitivity (or whatever parameter is concerned) - although generating valid inference from combinations of studies, which inter alia may not use independent data, is non-trivial.

But that wasn't the question I was answering, which concerned the interpretation of a uniform prior. And the justification given in Frame et al. for the use of a uniform prior for estimating climate sensitivity (as a parameter in its own right) was simply wrong. That prior was actually highly informative, towards a high estimate of sensitivity, although I don't think the prior can be interpreted in the mechanically probabilistic way that James interprets it.

Magnus Westerstrand said...

Thanks for that link!

Might need some time to sink in but one thing that is not taken in to account is how e.g. permafrost et. al. could act... or steep changes in ocean currents, biota etc... so sure might be a better way to constrain... however what if you do cut the uniform prior at say 7 or 8? or 6? Expert opinion is all swell but we do not know the unknown.

Still probably better as said but making big economic decisions on this makes mu stomach uneasy..

crandles said...

Magnus,

Changing the upper bound makes a big difference to the posterior probability. James showed that if you want your posterior to show something ridiculous say probability of climate sensitivity being over 30C is over 10% then as long as the results don't perfectly show what the answer is (ie always) then there will always be an upper bound for a uniform prior that achieves that.

With that sort of procedure, a queasy stomach seems a suitable reaction.

Fortunately if you use expert priors that are as different as seems reasonable and the posteriors vary widely then you know your data isn't good enough. If you use such priors and there is little difference in the posterior functions then great you have a sensible answer.

Magnus Westerstrand said...

crandles,

Yes, but you also cod get a reasonable value by using a low upper limit?

As for the expert judgment, you have to be in the right ball park... using e.g. values from resent temperature measurements will not tell you what might happen with the "permafrost" in the future...

crandles said...

nic-lewis,

Yes, I accept that using only independent data is certainly not trivial.

>"meta-analysis of more than one study"

If you want to make decisions that result in only meta-analysis studies of more than one data set are the only ones that can result in realistic estimates then I think you may be able to stick to your views. That gives James and me some issues with how you prevent people looking at one study and taking the headline result to be realistic. That seems to me to be what people want to be able to do.

I saw James's question as a matter of first determine your views in order to know how to attack them.

>" That prior was actually highly informative, towards a high estimate of sensitivity, although I don't think the prior can be interpreted in the mechanically probabilistic way that James interprets it."

Agreed the prior was highly informative. But that begs the question of how it can be highly informative if you cannot interpret it in a probabilistic way.

Obviously James and I think it can be interpreted probabilistic way and judged to have too high a probability for climate sensitivity over 5C. So is it a matter that an objectivist Bayesian doesn't think it can be so interpreted while a subjectivist Bayesiam thinks it can?

Or do you believe the prior cannot be interpreted in a probabilistic way?

I see you haven't commented on the xkcd-cartoon situation I provided a link to. Doesn't that show how people want to interpret the headline result as realistic?


crandles said...

Magnus,

Yes it follows that there is a lower limit that will get a sensible answer. But that is likely to be a case of two wrongs happening to make a right. The cliff edges of a uniform prior are unlikely to be sensible, far more likely the probabilities of a sensible prior tail away to very low levels smoothly.

Permafrost of course releases GHGs. So this shouldn't affect climate sensitivity (i.e. temperature change in response to a doubling of CO2 levels.) Instead it is an extra GHG forcing.

Sensitivity may change but the likely direction is downward as sea and other ice disappears.

(To avoid giving the wrong impression, I think there is plenty of reason for more action to reduce GHG emissions. More risk would make more action appropriate. We don't need to falsely play up the risks to make action appropriate.)

James Annan said...

Chris seems to have covered it all very well, thanks!

Nic, your reply seems rather evasive and awkward to me. There is no reason to introduce all sorts of quibbles and conditions about likelihoods. My proposition was very trivial and straightforward:

A uniform probability distribution U[0,20] for x carries with it the implication that P(x gt 6) is 70%.

If you cannot agree unequivocally with this statement, unconditionally and irrespective of how you might interpret and apply the concept of "probability" in the real world, then you simply aren't talking about probability.

Magnus Westerstrand said...

Agree that James argument in the paper for experts is a good one... Just do not see it as a definitive one. A bell shaped narrow distribution might be strange I do not know at the moment how that might play out and perhaps there are other reasons not to have a to low upper limit...

Sure permafrost is not effecting sensitivity per definition but the reason for all of this more or less is for economical models? And for that it will have more or less the same implications? (I also mentions other what ifs) So using experts might narrow the upper limit to much... no way to know for sure. However, got to admit that James puts forward a strong argument and that it seams better then the other (however have not seen literature on it and I am new to all of this I confess, and there are other problems with economical models).

Just looking at the examples in the paper linked I am a bit surprised at how much you could narrow the span with expert guessing... with the use of single studies. Still a good argument but no prof.

nic-lewis said...

James
"My proposition was very trivial and straightforward:

A uniform probability distribution U[0,20] for x carries with it the implication that P(x gt 6) is 70%"

I wasn't seeking to be evasive, but I (apparently wrongly) took your question to relate to the use of a U[0,20] distribution as a prior. As I indicated, I don't think a prior has, in general, a direct probabilistic interpretation.

Viewed simply as a probability distribution for x, then indeed U[0,20] implies P(x gt 6) is 70%.

crandles said...

nic-lewis,

Maybe you missed James's reply to Eli saying

"Any prior has a specific and precise probabilistic interpretation."

So James's question did relate to a prior.

You can stick to your view if you want, but you do seem to be unnecessarily tying yourself up in knots to try and maintain it.

crandles said...

OT sorry. Does this sound like a suitable bet target:

"My projections for our planet conditions when the sea-ice has all vanished year round (PIOMAS graph projects about 2024 for this; I forecast 2020 for this) are:
Average global temperature: 22°C (+/- 1°C)
(rise of 6-8°C above present day value of about 15°C)"

"Paul Beckwith, B.Eng, M.Sc. (Physics),
Ph. D. student (Climatology) and
Part-time Professor, University of Ottawa"

http://arctic-news.blogspot.co.at/2012/06/when-sea-ice-is-gone.html

James Annan said...

Nic, I don't understand what you mean by "prior" if you think it does not have a probabilistic interpretation.

"Prior" is simply shorthand for prior probability distribution.

James Annan said...

Chris, I'd say yes, but some random clueless student probably isn't worth bothering with :-)

EliRabett said...

To turn this argument somewhat around, consider a case where the prior assigns a zero probability to some outcome (for example from a theoretical limit). What results if reality (aka the data) differs?

crandles said...

Eli,

Not quite sure where you are going so hope this isn't too irrelevant.

The prior should still have tails outside the theoretical limit depending on how likely it is that the theoretical limit is wrong.

If the theoretical limit is very sound, then the data may well be misleading but is still likely to indicate that the real value is close to the theoretical limit and that is what the posterior pdf will tell you.

That might give you a tight range in your posterior when there is a chance there is a major problem with the data and the posterior should still be very wide.

Steve Bloom said...

Re the clueless student, I seem to recall him very confidently predicting an ice-free summer in 2012.

nic-lewis said...

James
You say "Prior" is simply shorthand for prior probability distribution.

But some great Bayesian statisticians regard non-informative priors being chosen as suitable to express ignorance relative to information which can be supplied by a particular experiment (Box and Tiao, 1973, p.46) or having no direct probabilistic interpretation (Bernardo and Smith, 1994, p.306). Indeed, Bernardo and Smith describes reference priors as "merely pragmatically convenient tools for the derivation of reference posterior distributions".

Further, the great statistician who has been around longest, Don Fraser, wrote in 2011 in Default priors and approximate location models: "A prior for statistical inference can be one of three basic types: a mathematical prior originally proposed in Bayes (1763), a subjective prior presenting an opinion, or a truly objective prior based on an identifi ed frequency reference."

So, while subjectivist Bayesians may be clear as to what they think priors represent, there are other views - just as there are as to what probability represents.

James Annan said...

Nic, you are still confusing the issue with interpretations of probability.

A prior is a prior probability distribution, because this is what Bayes' Theorem and the axioms of probability require. How you interpret this probability is up to you, but the fact that it is probability is not up for debate. None of your references provide any support for you to dispute that a prior is a prior probability distribution.

EliRabett said...

There is no requirement that any prior have infinite wings.

James Annan said...

Eli,

Agreed.

In response to your previous, there is no problem if the likelihood happens to be non-zero somewhere that the prior is zero - this just means you have prior knowledge that such values are impossible even though these data do not rule them out.

If the likelihood is *only* non-zero where the prior is zero (equivalently: is zero everywhere that the prior is non-zero), then you *do* have a problem, and had better go back and work out what it was. It may not be the prior!

EliRabett said...

The problem is that the answer you get depends on the prior you start with and you have no way of objectively choosing the prior. You showed this in the example of the broad uniform prior. Eli pointed out another example. So, unless you come up with a qualifying procedure for priors it reduces to whom do you trust which is not a very comfortable place.

David B. Benson said...

One always has an ample supply of prior knowledge, some of which may be relevant for the task at hand.

For example, the available energy cannot be very large (as nothing explodes) nor can the inertia be very large (as nothing dashes quickly away).

skanky said...

"The problem is that the answer you get depends on the prior you start with and you have no way of objectively choosing the prior."

A succinct description of the blogosphere.

crandles said...

Eli,

>"you have no way of objectively choosing the prior ... it reduces to whom do you trust which is not a very comfortable place."

Yes different people with different subjective views will get different answers. Thats life. How do you deal with it?

If different peoples answers are very close, great we know the answer within a reasonable uncertainty range.

If the answers are very different then the data isn't yet good enough.

If you still need to know what to think then you need to form a subjective opinion. Presumably you will know what period your data relates to and when it was first available. If that data is fairly recent you can look at what people thought before the time the data was available as it is difficult to see how the data affected peoples opinions before it was available.

James did precisely this going back to Arrhenius and working forward to the start of the data used. This should give a reasonable ability to judge what a reasonable prior is. E.g. I feel the cauchy distribution James used has tails that are too fat.

EliRabett said...

"That's life. How do you deal with it?"

Beaches and beer.

OTOH, it is clear that a single person coming up with a prior is a weak method. Whose guru do you trust? Some sort of Delphi process (aka IPCC) would be superior.

Eli's argument is that the construction of the prior should be structured.

crandles said...

>"single person [...] is a weak method.
...
construction of the prior should be structured"

I agree that if different results can be obtained by using different reasonable priors, a multi-person derived prior would be better.

If you don't get much different results then is there much point bothering with a structured effort at creating a prior?

(I posed a question above of whether useful for comparing different analysis methods but no reply.)

James maintains that there is little difference in results and certainly doesn't appear to want to make such an effort to create a multi-person prior.

Eli, if you are not yet ready to draw the conclusion that reasonable priors make little difference, how long or for what events are you going to wait? For a paper criticising James' paper?

If you do draw the conclusion that different reasonable priors make little difference, why are you calling for the prior to be structured? Comparing analysis methods or something else?

EliRabett said...

Crandles, Eli is not talking about a specific case, but in general about Bayesian methods. We agree that if the situation is well understood then constructing the prior will not be much of a problem. Otherwise. . .

EliRabett said...

A comment from a ray @ RR

"If the prior distribution, at which I am frankly guessing, has little or no effect on the result, then why bother; and if it has a large effect, then since I do not know what I am doing how would I dare act on the conclusions drawn?"--Richard Hamming

When using a minimally informative prior, isn't Bayesian methodology really just a trick for turning likelihood into a probability?

James Annan said...

The problem is, when you get down to the details, "minimally informative" doesn't actually exist.

You give me a prior on S, I will tell you to the nth decimal place what that implies about S. Even if you thought your prior was "minimally informative".

Chris certainly gets it.

BTW, my Cauchy prior was intended as an extremely alarmist sensitivity test, and it demonstrates the robustness of our analysis. But if you choose a sufficiently extreme prior (such as uniform, in this case) then you can break any Bayesian analysis.

I don't want to create some supposedly "consensus" prior because it would be too easily dismissed as just one person's opinion - and if I did a survey, what would I do with those who continue to stubbornly insist on uniform, even after I've shown why it doesn't work? I've learnt from bitter experience that people are much more likely to accept things once they have worked though it for themselves.

EliRabett said...

There are procedures such as Delphi for handling outlier priors