Ok, having got various things out of the way, on with the show.
I liked this letter which appeared in Nature recently. Not just because I'd done something similar myself with the earlier Hansen forecast :-) In general, I think it's important to revisit historical statements to see how well they held up. Allen et al have gone back to a forecast they made about 10 years ago, and checked how well it matches up to reality. The answer is...
really well. On the left, is the original forecast with new data added, and the right is the same result re-expressed relative to a 1986-96 baseline. The forecast was originally expressed in terms of decadal means, so I don't think there is anything untoward in the smoothing. The solid line black in the left plot is the original HadCM2 output, with the dashed line and grey region representing the adjusted result after fitting to recent (at that time) obs using standard detection and attribution tchniques.
They also compared their forecast to a couple of alternative approaches:
This plot shows the HadCM2 forecast (black), CMIP5 models (blue) and another possible forecast of no forced trend, just a random walk (green). The red line is the observed temperature. They point out that their forecast performed better than the alternatives, in the sense that it assigned higher probability (density) to the observations.
So far, so good. However, I disagree with their statement that "the CMIP5 forecast also clearly outperforms the random walk,
primarily because it has better sharpness" (my emphasis). Actually, the CMIP5 forecast outperforms the random walk simply because it is clearly much closer to the data. The CMIP5 mean is about 0.41 in these units (all these numbers are just read off the graph, and may not be precise), the random walk is of course 0, and the observed anomaly is 0.27. The only ways a forecast based on the CMIP5 mean could have undererformed the random walk would have been if it was either so sharp that it excluded the obs (which in practice would mean a standard deviation of 0.06 or less, resulting in a 90% range of 0.31-0.51), or so diffuse that it assiged low probability across a huge range of values (ie, a standard deviation of 0.75 or greater, with associated 90% range of -0.8 to 1.6). The actual CMIP5 width here seems to be close to 0.1, well within those sharpness bounds.
I do think I know what the authors are trying to say, which is that if you are going to be at the 5th percentile of a distribution, it's better to be at the 5th percentile of a sharp forecast than a broad one. But changing the sharpness of the forecast based on CMIP5 would obviously mean the obs were no longer at the 5th percentile! In fact, despite not quite hitting the obs, the CMIP5 forecast is not that much worse than the tuned forecast (black curve), thanks to being quite sharp. And according to the authors' own estimation of how much uncertainty they had in their original forecast, they obviously got extraordinarily lucky to hit the data so precisely. With their forecast width, it would have been almost impossible to miss the 90% interval - this would have required a very large decadal jump in temperature. I don't think it is reasonable to say that one method is intrinsically better than the other, on the basis of a single verification point that both methods actually forecast correctly. If the obs had come in at say 0.4 - which they forecast with high probability - I hardly think they would have been saying that the CMIP5 ensemble of opportunity was a superior approach.
(For what it's worth, I think the method used in this forecast intrinsically has exaggerated uncertainty, but that's another story entirely.)
15 comments:
To me what's interesting here is that if the observations had been only slightly lower, then the probability from the null would have been higher than the CMIP5. You know, when you talk about decadal averages, the bar is set rather low I would argue.
Yes indeed, if global warming weren't happening then it would be rather hard to reject the null hypothesis ....
Cynical and sarcastic, aren't we? My point is that the difference between the skill of CMIP5 and the null hypothesis is not that large. In any event, I'm sure Allen would admit that hitting the decadal average right on the head is fortuitous. So, I'm not sure what the importance of this is from the larger perspective.
I'm reminded of the dictum that even a broken clock is right twice a day. Treating this as some sort of vindication of climate models is not only unscientific but is merely grasping at straws.
The scientifically interesting question is what about Miles model caused this coincidence? Probably its too complex and muddled to really say much. And that's part of the problem here. But, I'd be very interested to hear it if James or anyone else wants to say something about it.
Well I think part of the issue (that I hinted at in the post) is that the method intrinsically exaggerates uncertainty, basically because the only data it uses is a bit of the surface temperature record (eg ignoring ocean heat uptake), and it also does not take account of anything else that we know about the system (ie prior knowledge if we were taking a Bayesian approach). So their forecast could reasonably have been a bit narrower, in which case the good fit wouldn't seem so lucky (irrespective of whether the mean would have shifted a bit or not).
The random walk forecast has to be broad, if you are trying to account for past changes as purely due to internal variability.
Another point that perhaps shouldn't be overlooked, is that although they only used data up to 1996, they published in 2000, by which time they had already seen some ongoing warming at about the right rate (including the 1998 el nino, but also some more ordinary years). One might speculate that if observations had been heading off-screen at an alarming rate (in either direction), they wouldn't have submitted the work...
Does the superiority of the Allen and the CMIP5 models over the random walk tend to disprove climas of a "pause" in global warming?
Wouldn't a "pause" in global warming imply a surface temperature described by a random walk?
joycet
Well, I don't really like to talk in terms of proving or disproving, based on basically a single evaluation point (decadal average). Also, note that this assessment is based on a 1986-96 baseline, and most talk of the pause is post-2000 (or cynically cherry-picking, post 1998).
But I'd agree with the sentiment that there hasn't been anything worthy of the term "pause", just a bit of a slowdown.
To first order the forecast is a product of the climate sensitivity and the emissions scenario forcing. To (slightly) better evaluate a model using a forecast you have to correct the assumed forcing with the real one. That, of course, is the lesson of Hansen et al 1988 where the sensitivity was much too high (4.2 C/doubling CO2 or some such)
Yebbut actually you really want the transient response (TCR) rather than the sensitivity. Hansen model was high on both counts, probably even worse (relatively speaking) for TCR but it's hard to be sure.
HadCM2 had a rather low equilibium sensitivity (2.5) but actually quite a mid-range transient response (1.7), and scaling the latter down by 20% for the central forecast in this paper gives a value rather close to many peoples' best estimate for the latter of 1.4ish.
Just as a brief reminder. HadCM2 does not include indirect aerosol effects. The omission of counterbalancing BC forcing makes up for some of the discrepancy, but at least their 1990 aerosol forcing (as published in Johns et al. CD 1997) is very likely an underestimate. I would generally be more cautious regarding the aerosol forcing. The notion that AR5 reduces the aerosol forcing is in my point of view a bit misleading. The adjusted forcing is still fairly high (-0.9W/m2). However, the reduced uncertainty range (if it remains that way) is way too optimistic.
The latest MACC results (still a model, but nudged towards observed AOD) are also not particularly comforting in this regard. The [total AOD] is up, and part of it is of anthropogenic origin, reflected in a small upward negative [forcing trend]. What's more, we have [Fyfe et al. GRL 2013] who argue that the stratospheric AOD has likewise increased. On the other hand we have [Klimont et al. ERL 2013] who claim that sulfate emission have decreased. The decreasing AOD might have been compensated by quite a substantial increase of AOD over the Arabian Peninsula due to mineral dust. A feature which seems rather poorly captured by MACC. The forcing estimates might change accordingly.
Therefore these numbers aren't the final word. Nonetheless, I wouldn't bet on the lower aerosol estimates. I agree that we can almost certainly rule out the high end estimates. I also agree that the latest OHC estimates (Balmaseda et al. GRL 2013) bring observations only closer to the models. But the good agreement between models and observations over land (see Geert Jan) indicates that they get the "easier" part of the story right. The oceans seem to be the main problem. Coincidentally enough, I finally like to recall that the indirect aerosol effect is considered to be strongest over oceans too (with the [North Pacific] to be affected the most in the last decade ... in case MACC gets it at least partly right).
Karsten,
I think the Allen et al. forecast was more involved than you're assuming. They took HadCM2 historical outputs and observations between 1946-1996, then compared spatio-temporal temperature patterns. From this comparison they derived scaling factors for each of WMGHGs and sulphate aerosols. These scaling factors were then applied to the IS92a scenario in order to make a forecast.
It therefore doesn't matter (within the logic of the study) that the global net aerosol forcing in HadCM2 was probably less negative than reality: the method is supposed to detect the "true" aerosol effect from the aforementioned spatio-temporal comparison. As it happens the analysis found a best-estimate scaling factor of 0.6 for sulphate aerosols - i.e. the impact of sulphate aerosols in HadCM2 was too strong. - albeit with a 90% uncertainty range of ~ -0.5 to 2.0 (BTW, I'm going to speculate that James' "another story" has something to do with the priors used to constuct the scaling ranges :P).
What does matter for this comparison is that the model produces an accurate spatio-temporal aerosol effect. Given what you've said it cannot possibly directly provide information about the effects of carbonaceous, nitrate, dust aerosols, nor their indirect effects. Since these will have affected observations they must be implicitly contained *somehow* in either/both of the WMGHG and Sulphate scaling factors. I think it's quite plausible that, for example, the effects of carbonaceous aerosols in the Tropics could have been misinterpreted as lower sensitivity to GHGs - the detected WMGHG scaling factor being ~0.8 (0.3 - 1.4).
----------------
Regarding the splitting of model analysis into accurate land/inaccurate ocean: I don't think this is a tenable way to think about things given the tight coupling of land and sea surface temperatures in models, and the strong assymetry of this coupling - land temperatures appear to be much more affected by SST changes than directly by forcing. Any "solution" for ocean temperatures will also have big implications for land.
I agree mainly with PaulS here - the thing that matters from the POV of this forecast, is whether the overall trend in forcing changed radically after the hindcast period, other than the reasonably steady increase assumed in scenarios. "Missing" factors can be implicitly accounted for by scaling the existing forcings, at least assuming some degree of collinearity.
When you look out to 2050 and beyond, it might start to matter more that the scenario used has a very large 1%pa CO2 increase, offset by a huge increase in aerosol load (compared to more modern scenarios). Even in this case, the net forcing might not be too far out. From Fig 1b, their forecast 50y trend is about 0.1 to 0.25 per decade, which seems unlikely to be wrong.
Thanks PaulS! My case was certainly oversimplified regarding the HadCM2 scaling (as in Allen et al. 2000 (A00) and Stott et al. 2001 (S01)). I have my doubts, however, that the spatio-temporal temperature comparison provides overly reliable results without the indirect effect. From Mitchell et al. 1995 it appears that the sulphate (direct) forcing is too strong over land (not surprisingly) which must have an effect on the fingerprinting analysis.
But there is more: Fig.4 in A00 corresponds to Fig. 10a in S01, i.e. it is the global (T0) fingerprinting analysis. Results change at T4 (Fig.11a in S01) or T99 (Fig. 4a in S01). Both, T4 and T99 increase the scaling factor for sulphates and WMGHGs for the 1946-1996 period quite considerably. For T99 it is 1.5 (0.5-2.5) instead of 0.8 with T0 for sulphates. Perhaps I'm utterly wrong, but it seems as if reality (i.e. higher spatial resolution) tends to have the sulphate forcing stronger? At least one would expect T99 to reproduce the spatio-temporal aerosol effects better than the global average, regardless of the weaknesses related to the direct forcing assumption. If this is the case, my point still stands. Not sure whether the successful forecast of the last 15 years would be helpful in this regard. Please correct me if I misinterpreted S01 (and A00 as a consequence).
Re the land/ocean splitting: Agreed! It was just a bit surprising for me to see the good match over land (hadn't exactly noticed that before). The increasing land-ocean-temperature asymmetry is however slightly suggestive of a considerable forcing component over land. Does anyone of you can point me to a study which was aiming at disentangling the SST and forcing impact upon the land temperatures? It would be particularly interesting for Eurasia. Globally, the SST component certainly dominates.
P.S. @PaulS: Haywood et al. 2013 are following up our discussion over at Isaacs Blog to some extent. Have to go through the details ...
Karsten,
To be more clear, I definitely concur that their sulphate aerosol scaling factor wouldn't provide an accurate scaling of total sulphate aerosol impacts, even if the method worked perfectly, let alone the impact of all aerosol species. That's because, as you say, indirect effects have different spatial patterns. What I'm suggesting is that those indirect effects, and other aerosols impacts, might have sneaked into the WMGHG scaling instead.
That would be fine if the spatial pattern remained proportionally fixed as forcing increased - even though the scalings would be technically wrong, the compensating errors could still allow a successful forecast. I would add to James' point about a change in forcing trend that a change in the spatial distribution of forcing could be equally damaging to this forecast attempt. That's not really an issue for WMGHGs but definitely is for aerosols. I think the latitudinal shift of aerosols from NH Extratopics to Tropics over the past couple of decades would count as such a problem.
--------
land/ocean warming: I'm like a broken record on this but I think AMIP model runs can provide some insight. Isaac Held wrote a post looking at land temperature outputs from an AMIP model, where SSTs are prescribed from observations. The output is an excellent match for land surface-air temperature change over the satellite period, which indicates that the apparent land-sea warming discrepancy can be largely explained as a function of the spatial distribution of SST warming over the past few decades.
This is kind of intuitive - if the distribution of energy in the oceans shifts towards mid-to-high latitude NH, that's where most of the land is so proportionally you will get more land warming faster than if energy were more evenly distributed.
Handily, Isaac also plots an AMIP run with the same model but without forcings applied to the land. There is still significant warming but noticeably less than with forcing. So, matching observed land warming does appear to require forcing, but not anything particularly exotic once you take into account spatial distribution of SST warming.
PaulS, thanks for the follow up. Being some sort of noob when it comes to Bayesian statistics, I wonder whether I am correct to assume that the application of Bayesian statistics to determine the PDFs of the key forcings and dynamics does not resolve the problem of changing spatial forcing pattern (as long as it relies on models with simple aerosol physics that is)? I am asking because I just noticed that Nic Lewis got his paper on this subject [published]. He has chosen the total aerosol forcing prior to lie between +0.5 and -1.5 W/m2 (1860-1980s) which gives him - not surprisingly - a ridiculously low best estimate around -0.4 W/m2 (if I read the paper correctly). Personally, I consider this result implausible and almost certainly a (strong) underestimate. If he wants to convince me of the validity of his method, he has to come up with something better. At least his aerosol forcing prior should be in line with the latest estimates which we discussed earlier (-0.3 to -1.5 W/m2 for 2011 IIRC; barely different in 2001 which is his end year). Or is it that I am completely mistaken in assuming that one is supposed to choose the most likely (forcing) range for the prior (rather than the range which one personally beliefs to be true)? As far as I know, the prior assumption will heavily affect the posterior results. So I don't really understand why such problems aren't picked up in the review process ...
Re land/ocean: Isaac's plots with the AMIP runs seem to answer my question fairly well.
P.S. @Jules/James: Thanks for all the EGU postings! I enjoyed reading them. It's a shame I missed your talks/posters due to temporal overlap with other session which I had to attend. It's also a shame I missed Brahms in the "Musikverein" on Tuesday ...
Post a Comment