At last, and after getting slightly sidetracked in various ways, I'll get back to the meat of things.
Forecast verification is the act of checking the forecast against the reality, to see how good it was. The basic aim is to see if the forecast was valid, in the sense that reality did not throw up any major surprises. You don't want your forecast to be confident of sunshine and warmth, but reality to be cold and rainy.
Obviously, for any current forecast, this check cannot even be attempted before the valid time of the forecast has arrived. So anyone who complains that today's multidecadal climate forecasts cannot be verified is merely stating a truism based on the definitions of the terms. A weather forecast also cannot be verified in advance of its valid time (say, tomorrow). But this in itself obviously does not mean that weather forecasts cannot be trusted and used. On the contrary, they prove themselves to be highly valuable on a daily basis, with industries ranging from agriculture to the military depending heavily on them. (That's despite there being ultimately no objective rigorous basis for the way in which the epistemic uncertainty in weather prediction is handled, as I've explained in more detail
here,
here and
here.)
In fact, even after the valid time of the forecast has passed, and even assuming that precise observations are available, verification is still not a trivial matter. Returning to my previous example of a rain forecast, if the forecast said "70% chance of rain", then either a rainy or dry day is an entirely acceptable outcome. So was that forecast inherently unverifiable? The inevitable answer is that yes, of course it was! Even for a quantitative forecast ("tomorrow will have a max of 12C, with an uncertainty of 1C"), it will only be on the rare occasion that the observed temperature falls far enough outside the plausible range of forecast uncertainty that one might be able to say that the forecast failed to verify. In fact, if we assume the forecast uncertainty is Gaussian (or any other continuous unbounded function), there is no threshold at which the forecast fails to verify in absolute terms - you might simply have got the 1 chance in 1000 that the target was 3 standard deviations from the forecast mean. Indeed, with one forecast every day, you'd expect to see this roughly once every 3 years. [Note that we check whether the data lie within the uncertainty of the forecast, not whether the (central) forecast falls within the observational uncertainty of the data - see
here for more on this.]
Once you have more than a handful of forecasts, however, you can usually make a realistic assessment of the reliability of the system as a whole - if many days are 3 standard deviations from the mean, you'll probably judge it more likely that the system is bad than that you happened to hit the 1 in 10
100 unlucky streak in a good system :-) But the latter can never be truly proven false, of course. Conversely, if the forecast system has validated consistently over a period of time, we will probably trust today's forecast, but even if the system is known to be statistically perfect, there is still a 1 in 1000 chance that it will be 3 standard deviations wrong tomorrow. Each day is a unique forecast based on the current atmospheric state, which has not occurred before. As I explained before, the forecast uncertainty is fundamentally epistemic not aleatory, so there is no sense in which there is a "correct" or "objective" probabilistic forecast in the first place. The uncertainty is fundamentally a description of our ignorance, not some intrinsic randomness.
Obviously it would take a long time to collect adequate statistics from successive 100 year climate forecasts if we started now. And given the rate of ongoing model development, this approach could never tell us much about the skill of the most up-to-date model anyway, since they are replaced every few years. We can however, use simulations of the historical record (and the present) to test how well the models can hindcast variations in the climate which are known to have occurred. In its simplest form, this sort of test provides only a lower bound on forecast errors, since the models are largely built and tuned to simulate existing observational data.
When the models fail to reproduce the data, of course it calls their validity into question - at least, it does if the data are reliable. A striking example of models teaching us about reality is in the recent
resolution of the tropospheric data/model incompatibility in favour of the models (OK, I'm over-egging things a little perhaps). Looking back over the longer scale, we have
Hansen's famous forecast from 1988, which has proved to be spot on over the subsequent 17 years. In fact, the simplicity of the physics means that one thing we really can forecast quite confidently is a continued global warming in coming decades: the IPCC TAR said it was likely to continue at 0.1-0.2C/decade for several decades to come, and although this perhaps could be nudged marginally higher (we are getting close to the 0.2 limit), it won't be far wrong.
A slightly more sophisticated general technique known as
cross-validation involves witholding some historical data, training the model on the rest of the data, and seeing if it correctly predicts the data which were witheld. In order to avoid accusations of cheating, it is necessary to use some sort of automatic tuning technique. If the data take the form of a time series which is split into an initial training interval followed by a forecast interval, then this accurately mimics the situation of a real forecast. It is also how new versions of weather prediction systems are tested prior to introduction - repeat the forecasts of the past year (say), and adopt the new system if it shows greater skill than the current one. We demonstrated a simple example of this cross-validation approach in
this paper a few years ago, and broadly similar methods can be found throughout the more prediction-focussed corners of the climate research literature (eg Reto Knutti used a neural network in
this forthcoming paper, training it on half the data and verifying it on the other half). These sort of formal forecast methodologies have not been widely undertaken in the GCM-building community in the past, partly because until recently there were no computationally-affordable automatic tuning methods, and partly because most climate scientists don't have much of a background in prediction and estimation - they are primarily physical scientists with an interest in understanding processes, rather than forecasters whose main aim is to predict the future. But there is now plenty of work going to bridge this gap, and here's the obligatory plug for the modest
contribution we're making in this area :-) Climate scientists may never get to the level that weather forecasting is at, in terms of attaching clear and reliable probabilities to all of our predictions, but we are definitely making progress.