Thursday, January 18, 2018

More sensitivity stuff

After what feels like a very long hiatus, it seems that people are writing interesting stuff about climate sensitivity again. Just last week on Twitter I saw Andrew Dessler tweeting about his most recent manuscript which is up on ACP(D) for comment. My eyebrow was slightly raised at the range of values he found when analysing outputs of the MPI ensemble, 2.1 to 3.9K, until I realised that these were the outliers from their 100-member ensemble and eyeballing the histogram suggests the standard error on individual estimates (which I didn't see quoted) is around 0.5C or lower. Worth considering, but not a show-stopper in the context of other uncertainties we have to deal with. It would, I think, be interesting to consider whether more precise estimates can be calculated with a more comprehensive use of the data, such as by fitting a simple model to the time series rather than just using the difference between two snapshots. Which, coincidentally (or not) is something I might have more to talk about in the not too distant future.

Then just today, a new paper using interannual variability as an emergent constraint. By chance I bumped into one of the authors last week in Leeds so had a good idea what was coming but have not had time to consider in much detail. (The nature paper is paywalled but has a copy already.) Here's a screenshot of the main analysis for those who can't be bothered downloading it. The x-axis is a measure of interannual variability over the observational period, and the letters are CMIP models.

Using interannual variability to diagnose the equilibrium response has a somewhat chequered history, eg here and here for my previous posts though the links to the underlying papers are dead now so I've put the new ones here:

The central problem with the Schwartz approach is the strong (and wrong) assumption that the climate system has a single dominant time scale. It is easy to show (I may return to this in a future post) that the short time scale response simply cannot in principle directly constrain the equilibrium response of a two-time scale system. So this may be why the idea has not been followed up all that much (though in fact Andrew Dessler has done some work on this, such as this paper for example).

The latest paper gets round this by essentially using climate models to provide the link between interannual variability and equilibrium response. It remains possible that the models all get this wrong in a similar manner and thus the real climate system lies outside of their prediction, but this “unknown unknown” issue intrinsically applies to just about everything we ever do and isn't a specific criticism of this paper. My instinct is their result is probably over-optimistic and future work will find more uncertainties than they have presented, but that could just be a reflexive bias on my part. For example, it is not clear from what is written that they have accounted for observational uncertainty in their constraint, which (if they have not done) will probably bias the estimate low as uncorrelated errors will reduce their estimate of the real system's autocorrelation relative to the models where obs are perfect. There is also a hint of p-hacking in the analysis but they have done some quite careful investigation and justification of their choices. It will certainly provide an interesting avenue for more research.


Everett F Sargent said...

If I understand your last sentence correctly, their current estimate is biased somewhat high and correcting for autocorrelation will lower their current estimate somewhat. Is that, more or less, correct?

Everett F Sargent said...

I meant your next to last sentence. Sorry about that.

Everett F Sargent said...

OK, I've taken a very quick look at the paper. so that their 'optimism' is with respect to their for a somewhat lower upper bound for ECS (e. g. less of a fat upper bound tail).

Sorry for any confusion on my part.

James Annan said...

Precisely opposite to your first comment, I was speculating that the noise in the obs had pushed their autocorrelation down a bit which would lower their estimate of S. A quick calculation (which I've now done) suggests that this effect is negligible however.

Paul Skeoch said...

Tried to read the paper and understand what they did, but not sure I do. Essentially they've defined a "variability metric" and tried to calculate equivalent values for that in models and real world global average surface temperature data.

Their Figure 2a plot seems to show that there was little disagreement or ECS correlation in their variability metric in the early 20th Century, but that quickly changes post-1950. Which indicates this is largely about differences in forced response, both to natural (mostly volcanic) and anthropogenic factors.

The difference between observational datasets is interesting. The NOAA analysis is an outlier on the high side, but then GISS indicates the smallest variability despite being essentially the same data with an infilled Arctic. The r^2 value for annual average anomalies between the two datasets is >0.99. It suggests their detrending procedure is quite sensitive and is a big factor in the results.

James Annan said...

Agree and this is why I wondered about p-hacking, on the other hand their results don't look to be hugely sensitive to the details so long as they include the recent warming/forced period.

And on checking it seems unlikely that the uncorrelated obs errors matter, according to HadCRUT4 they are only about 0.02C per year which I think is too small to make a difference.

Paul Skeoch said...

Actually, it is possible that they used NOAA and GISS data with different ERSST versions. The upgrades to v5 were happening around the same time that they submitted the paper. Even so, I don't think there's that much difference between v4 and v5.

Wonder what you'd get applying it to the recent SST reconstruction by Cowtan et al.? My suspicion, just looking at trajectories in comparison to the HadSST3 version and CMIP5 mean is that it would support significantly higher sensitivities.

Everett F Sargent said...

"Precisely opposite to your first comment ..."

Yeah, I thought so. It took three tries, and still, the 3rd one still doesn't quite parse out correctly. Thanks.

David Young said...

The question I have is why this latest paper should be given more credibility than the seeming scores of emergent constraint papers lately. Some of the other ones if I recall found higher ECS in the 4C range. There are literally thousands of potential emergent properties of models. It seems to me like a fruitless exercise albeit one that is easy to perform and write a paper about. Of much more concern are some recent negative results showing for example that ECS of a GCM is quite sensitive to details of convection modeling and that there was no obvious observational constraints to set these parameters. Lots of work on aggregation too suggests that its important and absent in current models. Also low clouds in models seem to show incorrect time histories resulting in increased forcing at the surface. All of this is totally unsurprising but should be motivating an urgent search for answers.

Dessler et al looks to me like an interesting exercise perhaps about the deficiencies of the model studied but any real world implications are totally dependent on the skill of the model used.

So I'm not sure this recent uptick of papers has done much to help us in constraining ECS. Maybe that's your point too.

Andrew Dessler said...

The ECS estimate in our paper was not intended to be a rigorous estimate; rather, it's just an example of what you can do with our framework. In fact, it’s an emergent constraint estimate, which, if you read my twitter feed, you’d know I’m not crazy about. To understand why it’s in there, you have to know something about the history of the paper. We had previously submitted a version that does not have the ECS estimate in it to another journal. The reviewers seemed confused about the utility of our revised energy balance framework and questioned what the point was. In response, we decided we had to better show what the potential uses were, so we put that short discussion of ECS in the paper. However, you’ll notice that we don’t cite those numbers in the abstract or the conclusions — that’s a signal that the values are for illustration and not to be considered an important result in our paper. We have another paper that is basically ready to submit that has a more rigorous ECS range (likely 2.4-4.5 K) based on our revised energy balance framework.

Everett F Sargent said...

Getting out my handy Denier Rolodex, thumbing thorough it, ha ha, found it ...

Argument from incredulity, I see.

Arguments from incredulity are called non sequiturs.

So be it.

Thorsten Mauritsen said...

I believe Cox et al. used the Cowtan correction for 2m temperature warming faster than SST and filling of unobserved regions. Didn't make any difference, though. From some back of the enveloping, I also doubt the uncorrelated errors in observations will give you any appreciable bias.

No, it would seem to me the real assumption is whether or not models can be used for the calibration of the coefficients in the relationship between lag-1 autocorrelation and longterm ECS. Here I could see that there might be some unaccounted for biases, though I doubt they are really big.

Paul Skeoch said...


I believe Cox et al. used the Cowtan correction for 2m temperature warming faster than SST and filling of unobserved regions. Didn't make any difference, though.

I'm guessing that's responding to my reference to Cowtan et al.? By my reading they don't do that exactly, but they did test against the Berkeley Earth Land+Ocean dataset, which should be very similar, and with GCMs masked to HadCRUT4 coverage, finding little difference.

But that's not the Cowtan et al. reference I was talking about. It's this one, published a few weeks ago, concerning bias identification in raw SST data, presenting an alternative (albeit highly preliminary) SST history compared to HadSST and ERSST. Of particular potential relevance, it suggests generally significantly lower anomalies all the way from 1940 through to the late 1970s. It just seemed a little strange that the NOAA data would produce such different results, and I'm wondering how far alternative SST histories might affect it.

Paul Skeoch said...

There was another recent paper which may have some relevance for climate sensitivity estimates. They infer a high precision estimate of mean full depth ocean temperature change between LGM and pre-industrial of 2.57K +/-0.24, which is larger than GCMs produce. Using GCM relationships between GMST and mean ocean temperature change, it would suggest LGM GMST cooling of 5-7.5K.

James Annan said...

Thanks for the interesting ref! I hadn't seen that.

Not entirely convinced you can use model simulations for inferring surface temp like that - their LGM simulations are potentially quite far from equilibrium as they are nowhere near long enough for the deep ocean to equilibrate. Though some are initialised from a previously-calculated cold lgm state (eg previous model version, or the output of another GCM that already did it) so they may not all be too bad. But it's perhaps a bit of a leap to trust them in this way.

The stated 5-7.5 does not account for the uncertainty in their estimate either (merely the model spread in ratio), which could widen the range a bit. Still, will be interesting to see what the reaction to this is - 2.5C sounds like a lot in comparison to the MARGO SST.

Paul Skeoch said...

Only reason I saw it myself was because it got some promotion in "skeptic" circles. They are good for something. In articles accompanying the publication one of the authors was quoted saying that the mean ocean temperature had only warmed by about 0.1K over the past few decades. This was accurate and old information, but was numerically confusing for the uninitiated and got jumped on.

Thanks for the point about model ocean temps, was something I was wondering about.

Thorsten Mauritsen said...

Paul, without access to the new paper right now, but intuitively a longterm Exkursion in bias correction shouldn't change the lag-1 auto-correlation much.

James Annan said...

Disagree with that Thorsten, the autocorrelation will be pushed to high values if the time series curves away from the linear detrending for an extended period.

Paul's first comment seems particularly relevant here. The underlying theory really relates to internal variability but the constraint appears to depend on the forced response...which presumably depends on the particular forcings in the model as well as its response to them. I wonder if this analysis could be a disguised way of mining for the aerosol forcing/response?

Thorsten Mauritsen said...

Dear James, perhaps you are right I hadn't thought about it that way, more focused on year-to-year errors. That said, Cox et al. do use 55 year detrended segments so that some estimates will be affected by the Cowtan-bias-discursion, whereas others will not. If you then inspect Cox et al. Figure 2a you will find that Psi is surprisingly stable across a Century. Thus, I stay by my intuition that this bias correction does not matter much, but once the data is public it will be easy enough to test.

I agree this could be interesting regarding aerosols, though I think one would somehow need to tailor the method toward the problem, and I am not quite sure how. With RFMIP coming up, this will become easier.

Paul Skeoch said...

Ok, I've been able to replicate what Cox et al. did now. Here's what different obs. datasets produce. It seems that they must have used the GISS met station-only global average for the figure in the paper. The GISS Land+Ocean data produces results closer to NOAA (as you'd expect) and that's even more so the case for a version of GISS L+O using ERSSTv4 found using The Wayback Machine.

NOAA and GISS average higher values overall, though I found that squashing down the 1940s hump a bit moved them closer to the HadSST-based datasets.

The Cowtan-SSTbias curve is a very quick and dirty (so best to take with a pinch of salt) blend of Cowtan and Way Land average + SST via their bias identification from the new paper. It actually lowers the overall variability metric relative to HadCRUT4, though generally raises it for the most recent decades.

There are two factors contributing - standard deviation and lag-1 autocorrelation - and how the combination of those in models do and don't match the observed variability metric on average seems quite complex. In your write-up you mention the problem of "unknown unknowns" regarding model performance but isn't there a known issue here: that models don't do ENSO behaviour well and that's one of the key factors producing observed variability statistics.

Thorsten Mauritsen said...

Thanks Paul, great to see how all datasets support the Cox et al. study. This is of course something they could have easily done, though choosing a different dataset will not change the central estimate further than 3.0 K.

Yes, as I said above the leap of faith is that models get the ratio of short to long term feedbacks right.