Comments on James' Empty Blog: How not to combine multiple constraints...

I mean the Bayesian emulator ideas introduced by K...

2009-03-14T23:39:00.000+00:00

I mean the Bayesian emulator ideas introduced by Kennedy and O'Hagan, now being developed in the climate context by Jonty Rougier and people at the Hadley Centre. I think this is probably what "Gaussian process emulators" refers to in your paragraph.

The random forests abstract is already up on the EGU meeting site, so I don't think you need to worry about breaching confidences! Interestingly, the poster seems to be withdrawn, which is a shame.

>"The NN stuff is potentially interesting,...

2009-03-14T20:53:00.000+00:00

>"The NN stuff is potentially interesting, although there are alternative (standard) methods for emulation which have a much more rigorous statistical foundation. It is not clear to me why they chose this approach, or how its performance compares to the standard method."

What do you consider "the standard method"?

I now find out there is work going on to use "a technique called Random Forests as an emulator, which benefits from requiring very little tuning to get good results and it also provides a measure of uncertainty in each prediction.....
sending out some verification runs on CPDN .... each with an initial condition ensemble of 4 members, based on a continuous sampling of parameter space (rather than the discrete sampling used in the original ensemble), and am currently analysing the results. [student] plans to compare different emulation techniques, namely neural nets v random forest v gaussian process emulators and highlight the strengths and weaknesses of each.

(I expect the above will be made publically available soon so I hope there isn't too much harm in posting this now in this obsure location.)

Making no pretense at understanding the math -- th...

2008-09-27T17:41:00.000+01:00

Making no pretense at understanding the math -- this showed up in "related items" searches for climate articles, and vice versa. It appears from this that the economists have rediscovered Catton's "Overshoot" caution about removing redundancy from the human economy to the point where one hiccup causes a crash. They are warning one another about risks of low probability events -- like whatever happened to the credit system.

Brief excerpt in case it tempts any of the climate scientists to comment:

http://www.edge.org/3rd_culture/taleb08/taleb08_index.html

"3) Beware the "atypicality" of remote events. There is a sucker's method called "scenario analysis" and "stress testing"—usually based on the past (or some "make sense" theory). Yet I show in the appendix how past shortfalls that do not predict subsequent shortfalls. Likewise, "prediction markets" are for fools. They might work for a binary election, but not in the Fourth Quadrant. Recall the very definition of events is complicated: success might mean one million in the bank ...or five billions!

4) Time. It takes much, much longer for a times series in the Fourth Quadrant to reveal its property. At the worst, we don't know how long. Yet compensation for bank executives is done on a short term window, causing a mismatch between observation window and necessary window. They get rich in spite of negative returns. But we can have a pretty clear idea if the "Black Swan" can hit on the left (losses) or on the right (profits).

The point can be used in climatic analysis. Things that have worked for a long time are preferable—they are more likely to have reached their ergodic states.

5) Beware Moral Hazard. Is optimal to make series of bonuses betting on hidden risks in the Fourth Quadrant, then blow up and write a thank you letter. Fannie Mae and Freddie Mac's Chairmen will in all likelihood keep their previous bonuses (as in all previous cases) and even get close to 15 million of severance pay each.

6) Metrics. Conventional metrics based on type 1 randomness don't work. Words like "standard deviation" are not stable and does not measure anything in the Fourth Quadrant. So does "linear regression" (the errors are in the fourth quadrant), "Sharpe ratio", Markowitz optimal portfolio, ANOVA shmnamova, Least square, etc. Literally anything mechanistically pulled out of a statistical textbook.

My problem is that people can both accept the role of rare events, agree with me, and still use these metrics, which is leading me to test if this is a psychological disorder.

The technical appendix shows why these metrics fail: they are based on "variance"/"standard deviation" and terms invented years ago when we had no computers. One way I can prove that anything linked to standard deviation is a facade of knowledge: There is a measure called Kurtosis that indicates departure from "Normality". It is very, very unstable and marred with huge sampling error: 70-90% of the Kurtosis in Oil, SP500, Silver, UK interest rates, Nikkei, US deposit rates, sugar, and the dollar/yet currency rate come from 1 day in the past 40 years, reminiscent of figure 3. This means that no sample will ever deliver the true variance. It also tells us anyone using "variance" or "standard deviation" (or worse making models that make us take decisions based on it) in the fourth quadrant is incompetent.

7) Where is the skewness? Clearly the Fourth Quadrant can present left or right skewness. If we suspect right-skewness, the true mean is more likely to be underestimated by measurement of past realizations, and the total potential is likewise poorly gauged. A biotech company (usually) faces positive uncertainty, a bank faces almost exclusively negative shocks. I call that in my new project "concave" or "convex" to model error....

Their paper really discussed methods and generalit...

2008-09-12T11:03:00.000+01:00

Their paper really discussed methods and generalities with no solid numerical answers, so there is nothing really to "correct" in that sense. The difficulty in real applications is probably in deciding what the matrix should be rather than actually solving the resulting problem (it may depend on the size of the problem too).

I'd be very happy if the whole sorry mess died a death, but the worst case scenario of course is that some poor sod unconnected with the original work reads: "we have chosen to average the quadratic cost functions rather than adding them. ... to combine errors in this [latter] way would be incorrect", takes this to mean that averaging is correct, and does some completely bogus calculation as a result. Of course I may be fussing over nothing. We'll have to wait and see...but it seems they have gone to some lengths to promote this averaging method while maintaining plausible deniability That is, they have not actually made any explicit claims that averaging is ever correct in any situation at all, but it is easy to see how this paper - and indeed that specific section I quoted - could be interpreted as supporting the approach.

>"Try plotting x^2, 0.01x^2 and 100x^2 (we...

2008-09-11T16:38:00.000+01:00

>"Try plotting x^2, 0.01x^2 and 100x^2 (well you hardly need to go to the trouble...) and of course they all have the same minimum, which in mathematical terms is perfectly well defined, but in the context of the paper this phrasing refers to the spread in x for which the cost does not vary very much (for some rather vague definition of "very much") and thus the minimum of 100x^2 is much more tightly defined than 0.01x^2."

Yes I expressed that badly it needed a relative comparison (ie relatively how well defined it is). If you plot x^2+30 and y^2+30 and find x is better defined than y for some vague 'not very much' (a%?) variation in cost. If instead you plot 100x^2+3000 and 100y^2+3000 you will still end up with the same x is relatively better defined than y conclusion.

So if I have now understood correctly (but demonstrated I have failed to express that clearly in writing), the conclusion in the abstract that I quoted remains unaffected in the idealised case of independent observations.

>"The main point remains that the operation of averaging the constraints cannot correspond to learning from new observations."

I don't, at present, see any reason to dispute this or of this being the main point of your post. So I have nothing much to say. I would prefer to discuss the extent to which the problem you have identified and explained remains a problem to the the conclusions that have been left in. ie could co-varying observations change a well defined relative comparison if your correct full covariance matrix method was used instead of the papers averaging approach?

I am just wondering if no answer means the full matrix solution could well be either an intractable problem which might make it not an available alternative or that the effect are so negligable it is drowned out by other issues.

Either of these would appear to make some sense for your decision to not write a comment ... but wait until the method is copied somewhere else in a way that it can affect a papers conclusion?

Probably too soon for such pondering, absence of evidence ...

I get the impression that an "unjustifiable multip...

2008-09-11T15:02:00.000+01:00

I get the impression that an "unjustifiable multiplicative scaling factor of sqrt(N) on the width of the implied joint likelihood function" would not have any effect on where the minimum is or how well defined it is.

It won't affect the location, but in context "how well defined it is" is precisely the "sharpness" of the minimum. Try plotting x^2, 0.01x^2 and 100x^2 (well you hardly need to go to the trouble...) and of course they all have the same minimum, which in mathematical terms is perfectly well defined, but in the context of the paper this phrasing refers to the spread in x for which the cost does not vary very much (for some rather vague definition of "very much") and thus the minimum of 100x^2 is much more tightly defined than 0.01x^2. The main point remains that the operation of averaging the constraints cannot correspond to learning from new observations.

I'm surprised the final version is not on the CPDN web site, as they usually put their stuff up pretty quickly.

The NN stuff is potentially interesting, although there are alternative (standard) methods for emulation which have a much more rigorous statistical foundation. It is not clear to me why they chose this approach, or how its performance compares to the standard method.

Cannot comment much without reading the paper.The ...

2008-09-11T11:21:00.000+01:00

Cannot comment much without reading the paper.

The abstract includes:

"The use of annual mean or seasonal differences on top-of-atmosphere radiative fluxes as an observational error metric results in the most clearly defined minimum in error as a function of sensitivity, with consistent but less well-defined results when using the seasonal cycles of surface temperature or total precipitation."

I get the impression that an "unjustifiable multiplicative scaling factor of sqrt(N) on the width of the implied joint likelihood function" would not have any effect on where the minimum is or how well defined it is. However that assumes the observations are independent or so close as to not matter. If there are correlations, would you consider that this would be likely/very likely/etc to have much(?) effect on where in parameter space the minimum is or how well defined it is?

If it is likely to have a noticable effect, is this likely to be intractable or could estimates be made?

I imagine errors in the neural emulation might (also?) be quite significant.