Friday, January 13, 2006

Probability, prediction and verification I: Uncertainty

I'm going to spend some time addressing some issues in probabilistic climate prediction which have been bouncing around a few other blogs lately. I'll start with a comment about uncertainty.

Uncertainty can be broadly split into two categories: aleatory, and epistemic. The former is the sort of irreducible randomness that cannot be reduced by improved measurements, such as the outcome of a (mythical?) "fair coin toss", or the time to decay of a radioactive atom. The latter is the uncertainty that relates to our ignorance of the system, be it due to limited observations, a lack of understanding or approximations and errors in our models. We can reasonably hope that this uncertainty can be reduced by increasing our knowledge in various ways.

Almost all elementary probability theory is presented in terms of the frequentist approach - random coin tosses, or repeated samples from a well-defined distribution, or similar. In practical problems, however, almost all our uncertainty has an epistemic component. Even the fair coin toss could perhaps be predicted, if one observed the initial trajectory with sufficient precision. In fact it might not be too much of an exaggeration to say that aleatory uncertainty is limited to maths problems such as describing the pdf of the number of heads in 10 tosses of a fair coin. Epistemic uncertainty, in contrast, is near-ubiquitous. For instance, what is climate sensitivity? Obviously this is not an intrinsically random variable - merely an imperfectly known one.

The frequentist interpretation of probability only really applies to aleatory uncertainty. The pdf of the number of heads in 10 coin tosses can be estimated by repeated samples of 10 coin tosses, forming a histogram of the number of heads. Arbitrary precision can be achieved by increasing the number of trials (not the cleverest way of solving this particular problem but never mind about that). For climate prediction, we have one planet, and even though Monte Carlo methods and "perturbed physics ensembles" have a pseudo-frequentist apearance, we must not lose sight of the fact that the underlying answer to "what is climate sensitivity" is actually a single real number, not a distribution. The distribution is merely an artefact of our current ignorance, and we might hope that it will converge in the (near) future. So in practice climate scientists generally (universally?) adopt an explicitly Bayesian approach to estimation. Note also that the Aleatory Probability page on Wikipedia redirects to Frequency Probability, but Epistemic Probability leads to Bayesian Probability.

In weather prediction, a quasi-frequentist interpretation looks possible at a first glance. A forecast that says "70% chance of rain tomorrow" actually means "we think that of the N times we make this statement, roughly 0.7N times will have rain". However, there is nothing magical about this particular assignment of probabilities to these N days. A better forecast system might segregate the N days into 0.5N forecasts of "90% chance of rain" and 0.5N of "50% chance of rain". A badly calibrated forecasting system might say "50% chance of rain" for all of them :-) In each case, tomorrow's weather is actually a deterministic event, entirely determined by the current atmospheric state, and it's either going to rain or it isn't. The decision to assign a 70% probability to a particular forecast cannot based on any fundamental randomness in the atmospheric system (because there is none), and the hypothetical "probability distribution of the current atmospheric state" from which a forecasting system attempts to predict the future, does not exist as some physical reality, but only as a useful theoretical construct to describe our ignorance.

Occasionally one finds childish rants on the web about how "Bayesian probability is not science", but it would probably be less wrong to say that science can only use Bayesian probability, and a purely frequentist approach is limited to mathematical problems in textbooks. I am sure that despite his comments, the author of the linked rant makes use of the weather forecast :-)


Anonymous said...

Thanks, James, that was nicely written. Will you be getting to the Cohn and Lins paper as part of this? As you probably know, it's been the subject of a multitude of even more childish rants (back-handed compliment to Lubos!) over at you-know-where.

James Annan said...

Thanks Steve,

I've no plans for C&L, but we'll see how things go. It seems to me that the main content of their paper was basically valid, but they dressed it up in sceptical-sounding sentences (somewhat tangential to, and exaggerating the impact of, the science itself) for effect. The RC post tried to over-reach and turned into a bit of a train wreck IMO.

Brian said...

Would you say there's a difference between defined and undefined aleatory uncertainty? A fair coin toss is irreducibly random, but has defined 50:50 odds. Contrast that to a computer program that will use some unknown random process to choose between the words "warmer" or "cooler", but you don't know what the odds allocation will be between the two words.

I think this is the heart of the argument for people who say there's no reason to believe in anthro global warming AND say that 2:1 betting odds, offered to them against it getting colder in 10 years, is meaningless.

I doubt they really believe it, or else they'd also reject 20:1 odds, and I expect they'd jump all over those odds (like I would). It's still their argument though.

LuboŇ° Motl said...

I only use the "chances of rain" as a very rough estimate to see whether someone who has looked at the situation much more quantitatively - and with the help of computers - thinks about the weather tomorrow. It is about a subjective feeling of someone who may be trusted more than myself in this particular question.

It is just an argument that affects the question whether I will take an umbrella to Seattle tomorrow - still undecided, by the way. ;-) If they told me whether it will be raining above the campus on Monday morning and evening instead of the number 50%, it would be more useful. ;-)

The precise value of these "chances" has absolutely no meaning, and a different weather forecaster with different models would surely give you different figures.

The numbers are only quantitatively meaningful to the extent that they can be defined as frequentist probabilities (and experimentally verified) - which is partly possible in the weather once you define the ensemble of situations more accurately. The trade-off is completely clear: either you allow uncontrollable, model-dependent, and subjective effects influence your estimates, and then your numbers are not science. Or you keep things scientific, but then you are forced to admit that you simply don't know the answers to many questions instead of generating random figures representing "chance".


James Annan said...

But Lumo,

You just admitted that different forecasters could give different probabilities of rain (and they frequently do). So of course the weather forecast depends on "model-dependent, and subjective effects". And weather prediction uncertainty is not an aleatory uncertainty in any case, given that the atmospheric dynamics are deterministic. How can you possibly hope to treat today's forecast in a frequentist manner? Tomorrow's forecast is a physically distinct problem, not a replicate of today. Yet weather forecasting is an example of predictive science at its best - there's no shying away from reality, and the results prove their value on a daily basis. You may choose to only use the forecast as a "rough estimate", but many others whose livelihoods depend on it (such as farmers, energy companies) make directly quantitative interpretations of the probabilistic forecasts, and would be foolish not to do so.

Outside of mathematics, we never "know" the answers to any questions, without acknowledging some uncertainty in our answer. How old is the universe, BTW? The Earth? Are these intrinsically unscientific questions?


Without knowing the odds, you can't give a frequentist answer (and even if you do a frequentist experiment, you can only approximate the odds - so no-one can ever tell that a coin really is unbiased, except when this is given as a premise in mathematical problems). I don't think the people you are trying to debate with are worried about such philosophical problems, they are simply denialists who refuse to examine or acknowledge the inconsistency of their own positions...