Comments on James' Empty Blog: Why the multi-model mean is so good!

Well I don't think it will really cause a revo...

2010-06-24T13:22:17.287+01:00

Well I don't think it will really cause a revolution, but since it seems like it may be new in fields as diverse as NWP and "Forecasting" (financial) I would expect quite a few to at least take note. One thing it will do is more securely justify the use of the MMM. It (or a modified version) may help with unequal weighting too, but I haven't thought about that yet.

>"It seems like it might have a bit of an ...

2010-06-24T10:59:30.662+01:00

>"It seems like it might have a bit of an impact."

An equation allows the effect to be quantified. Do you foresee people calculating the RMSE for MMM then increasing this answer by a factor to reflect the advantage it has before comparing with the other models RMSE?

Or something similar, or some other quantitative use, or is the impact you refer to something completely different?

I think those papers were talking more about somet...

2010-06-23T22:16:56.159+01:00

I think those papers were talking more about something like the standard way of combining gaussian estimates, ie the x1*s2^2/(s1^2+s2^2)+ x2*s1^2/(s1^2+s2^2) formula (or something like that). But I haven't yet had time to read them very carefully. They are also only talking about combining two things, which may be a special case.

Anyway, if Eugenia Kalnay says she hasn't seen it before (and she does), that's good enough for me! I'm still baffled by the idea that this equation can really be new though. It seems like it might have a bit of an impact.

First of two papers linked by anonymous referenced...

2010-06-23T20:43:30.204+01:00

First of two papers linked by anonymous referenced Gauss 1809 so obviously your search won't be complete until you have gone back at least that far ;)

Belette, I'll post about the mean being bette...

2010-06-22T23:30:48.734+01:00

Belette,

I'll post about the mean being better than all models, but this result explains precisely the "thing" that I set out to explain, which is that the mean outperforms most models (though not necessarily all) by a significant margin.

I can't really bring myself to dis Armstrong when it seems that no-one else knows this result (even though some can at least prove that the mean always outperforms a typical model). I'm relieved that my co-author vetoed some slightly sarcastic comments in our manuscript about climate scientists being unaware of this well-known result!

Closest I can find are these: http://journals.ame...

2010-06-22T18:12:11.345+01:00

Closest I can find are these:

http://journals.ametsoc.org/doi/abs/10.1175/1520-0493%281977%29105%3C0228%3AHTIABC%3E2.0.CO%3B2

http://journals.ametsoc.org/doi/abs/10.1175/1520-0493%281977%29105%3C1198%3ACOTIAB%3E2.0.CO%3B2

From a reference in an ECMWF paper, which also discusses the issue.

I'm not convinced your result explains things....

2010-06-22T17:16:03.544+01:00

I'm not convinced your result explains things. Yes it is very neat and all, but it only explains what you say: why |avg(m)-O| is better than avg(|m-O|) (or whatever :-).

But the "observation" is that avg(m) is generally better than most if not all of the models. I don't think you've explained that.

Go on, write a paper dissing Armstrong and his ilk.

Wow, there's decades of literature on combinin...

2010-06-21T05:30:51.864+01:00

Wow, there's decades of literature on combining forecasts surveyed there, and it seems that none of them know this result.

James, Armstrong does seem to rely on empirical e...

2010-06-19T16:36:20.524+01:00

James,

Armstrong does seem to rely on empirical evidence rather than a proof like yours.

Looking though his Principles of Forecasting (available to browse at Amazon), especially Armstrong's section on Combining Forecasts in Chapter 13, it seems dominated by empirical arguments.

You could always ask him if the mathematical solution is well-known... :^)

-Chip

Oh my, that was easy! But I've not seen exact...

2010-06-18T23:29:07.454+01:00

Oh my, that was easy!

But I've not seen exactly that before, as best as I can recall.

Chip, thanks for that. Coincidentally I saw somet...

2010-06-18T22:56:33.717+01:00

Chip, thanks for that.

Coincidentally I saw something related a while back...this Armstrong article on significance testing criticises a paper KFHS which states "Finally, we find that the M3 conclusion that a combination of methods is better than that of the methods being combined was not proven."

Well it has been now :-)

(The Armstrong paper is unaware of the proof, but focuses on the inappropriate use of significance testing - another of my frequent complaints.)

I can't help but be amused that these self-appointed experts in forecasting don't know this result (though as DM showed, it is definitely well known in some quarters). Anyway I can hardly be too snarky about Armstrong when the climate scientists were unaware of it too!

James, I came across a reference to a similar top...

2010-06-18T18:26:32.974+01:00

James,

I came across a reference to a similar topic in the forecasting literature recently. Perhaps you may find it to be of interest:

http://www.forecasters.org/ijf/journal-issue/310/article/6222#abstract

-Chip

Yes, the second part of my comment was essentially...

2010-06-18T12:46:09.360+01:00

Yes, the second part of my comment was essentially James' point - however he hadn't exactly specified the problem completely, and I was thinking more in terms of the question of whether observations remain within the bounds set by the variance in the "model" - well slightly confused on the point anyway. Within the context of RMS differences, this post does nicely explain why the MMM is always "better".

Thanks DM, I'm sure this idea pops up all over...

2010-06-18T12:32:01.628+01:00

Thanks DM, I'm sure this idea pops up all over the place where people use ensembles. The book is here for anyone interested. I agree that my version is neater :-)

The result that the sum-of-squared error (SSE) of ...

2010-06-18T10:07:41.785+01:00

The result that the sum-of-squared error (SSE) of the mean of a committee of models is less than the mean SSE of the individual committee members is quite well known in machine learning, where committees of models are often used as a variance reduction technique. For example, see pages 365-6 of "Neural networks for pattern recognition" by Chris Bishop (an excellent book on the topic - 14,000 cites according to Google scholar!), which obtains the same result by a slightly different method.

The reason this happens is because the MMM is smoother and towards the middle of the distribution of the individual models. The SSE punishes large errors much more harshly than small errors (due to the squaring). This means that the average of the errors for the individual models will be dominated by the models with the greatest errors. This happens for two reasons, either they lie on the other side of the mean to the observations (high bias) or they are very noisy and the noise is badly correlated with the observations (high variance).

Jame's proof is still neat though, and easier to understand without the use of expectations.

>Hey, the ENSO thread is that way ^ OK >It...

2010-06-18T09:28:59.106+01:00

>Hey, the ENSO thread is that way ^

OK

>It turns out that this probability varies widely depending on a handful of factors.

Clearly the greater the variation between models, the greater the advantage the MMM has.

The nearer the obs trend is to the MMM trend, the more likely the MMM is to beat all models.

If the individual models have very similar trends then the MMM is more likely to beat all models.

A model having very similar deviations from trend to the obs may make it more likely to beat the MMM.

I doubt I am getting up to the 'handful' yet.

Oops, that also was "for another time".

Hey, the ENSO thread is that way ^ (I know it'...

2010-06-18T06:37:19.647+01:00

Hey, the ENSO thread is that way ^

(I know it's my fault for not writing it yet!)

Looking at GISTEMP, I was surprised to see that th...

2010-06-18T06:02:22.687+01:00

Looking at GISTEMP, I was surprised to see that the Jan-May 2010 is net .54 degrees ahead of 1998, and indeed after August 1998 anomalies dropped off to levels that haven't been seen for 10 years, so at this point I'd have to say I'll be surprised if 2010 isn't a new record.

Chris, remember that 1998 dropped off sharply at t...

2010-06-18T05:10:56.939+01:00

Chris, remember that 1998 dropped off sharply at the end of the year, so all is not lost. All else equal I suppose we can expect this year to drop off less sharply. So for now the odds look to me to be no worse than even, probably a bit better.

Also, which source was being used for temps?

ac, I'd definitely be interested in seeing any...

2010-06-18T02:41:02.317+01:00

ac, I'd definitely be interested in seeing any other written version of this. I suspect it has "folklore" status and is rarely written down as people are usually working on more advanced topics. I certainly can't believe it is unknown in the broader community, indeed it jumped into my mind immediately when a reviewer asked about it so I'm sure I must have seen it in a different context, but I don't know where. The multimodel NWP stuff I've read doesn't seem to quite state it though.

Alastair, we can easily create ensembles where one model is closer to the obs than the mean is - just add a perfect model (ie one that matches the obs) to the ensemble! Also, error cancelling will often occur in practice, but it is not necessary for the formula to work.

Answering my own question: -1, 3 -> 1 + 9 = 10...

2010-06-18T01:39:26.059+01:00

Answering my own question:

-1, 3 -> 1 + 9 = 10
-3, 1 -> 9 + 1 = 10
-2, 2 -> 4 + 4 = 8

so that works, but what about

2, 4 -> 4 + 16 = 20
4, 6 -> 16 + 36 = 52
3, 5 -> 9 + 25 = 34

Mean of 34 > 1st model of 20!

Neat. I recall reading a similar analysis in the s...

2010-06-18T00:36:43.808+01:00

Neat. I recall reading a similar analysis in the seasonal prediction literature, but the focus was correlation, not RMSE, and the theme was potential predictability. Could have been a tech report or just a set of slides. If it's of interest I might be able to scrape my memory for an author or other unique identifier.

What if the errors are (-1, 3) and (-3, 1)?

2010-06-18T00:04:20.747+01:00

What if the errors are (-1, 3) and (-3, 1)?

Chris, an ENSO post is long overdue, and on my lis...

2010-06-17T23:43:03.214+01:00

Chris, an ENSO post is long overdue, and on my list. In brief, yes I suspect you are right...

Chris, I didn't get that out of what Arthur sa...

2010-06-17T23:25:44.874+01:00

Chris, I didn't get that out of what Arthur said, but I might have misunderstood it (the first part of his comment may have put me off the track).

Belette, fixed, but not exactly as you suggest!

Anon #1, well if the mean is better than the typical model (in practice by a significant margin), it is surely going to be better than *most* models (for reasonable distributions). As Tim says, the question of the mean being better than *all* models is not amenable to a direct proof. In fact in the paper we consider it in a probabilistic form - that is, what is the probability of a single model being better than the mean? It turns out that this probability varies widely depending on a handful of factors.