Tuesday, September 15, 2020

SAGE versus reality

Something I've been meaning to do for a while is look at how well the SAGE estimates of the growth rate of the epidemic have matched up to reality over the long term. For the last 3 months now, SAGE have published a weekly estimate not only of R but also the daily growth rate, which is actually a more directly interpretable number (as well as being provided to a higher degree of precision). What I have done is taken their estimate of daily growth rate and integrated it over time. And plotted this against the number of cases actually reported.

Here we are:

The solid blue line is the central estimate from SAGE, with the dashed lines calculated using the ends of the range they published each week. Red is the weekly mean number of cases over this time period, with this line scaled to start at the same place in week 1 (ending on Friday 19 June). Latest SAGE estimate in this plot is from Friday 11 Sept.

Agreement was very good for the first few weeks, with case numbers going down at the rate described by SAGE of about 3% per day. But then the case numbers started to drift up in July...and SAGE continued to say the epidemic was getting smaller. Over the last few weeks the discrepancy has grown sharply. Note that the dashed lines assume the extreme edge of the range presented by SAGE, week after week - so this would require a consistent bias in their methodology, rather than just a bit of random uncertainty.

Honesty compels me to point out that the comparison here is not completely fair, as the number of cases may not be a consistent estimate of the size of the outbreak. Some of the rise in cases may be due to increased testing. However the discrepancy between case numbers and the mean SAGE estimate is now a factor of 10 compared to the starting point of this analysis. That's not due to better testing alone!

Saturday, September 12, 2020

Weekly RRRRRRReport

A lot of different estimates of the growth rate (R) of the epidemic have come out in the last couple of days, so here's a summary of which ones are wrong (and why) and which ones you can believe. And who am I to do this, you might reasonably ask? While not an epidemiologist, my professional expertise is in fitting models to data, which is precisely what this question demands. And the available evidence suggests I'm rather better at it than many epidemiologists appear to be.

As you may recall, a month ago I posted an argument that R really couldn't be under 1 any longer, and the epidemic was starting to grow again. At the time, the "experts" of SAGE were still insisting that R was less than 1, and they kept on claiming that for a while, despite the very clear rise in reported case numbers. The rise has continued and indeed accelerated a bit, other than for a very brief hiatus in the middle of last month. Part of this steady rise might have be due to a bit more testing, but it's been pretty implausible to believe that all of it was for a while now. I'll come back to SAGE's ongoing incompetence later.

I'll start with my own estimate which currently comes out at R= ~1.3. This is based on fitting a very simple model to both case and death data, which the model struggles to reconcile due to its simplicity. The average death rate (as a percentage of infected people) has dropped in recent weeks, thanks to mostly younger people being infected recently, and perhaps also helped by some improvements in treatment. I could try to account for this in the model but haven't got round to it. So it consistently undershoots the case numbers and overshoots deaths a bit, but I don't think this biases the estimate of R enough to really matter (precisely because the biases are fairly constant). Incidentally, the method I'm using for the estimation is an iterative version of an ensemble Kalman smoother, which is a technique I developed about 15 years ago for a different purpose. It's rather effective for this problem and clearly superior to anything that the epidemiologists are aware of. Ho hum.

Here are my plots of the fit to cases (top) and deaths (bottom) along with the R number.

As pointed out, these graphs need better annotation. Top graph is modelled daily infections (blue plume), modelled daily cases (green plume with some blue lines sampled from the ensemble and the median shown as magenta) and case ascertainment ratio which is basically the ratio of these (red plume, RH scale). Reported case numbers are the red circles. Bottom graph is modelled deaths (green plume with lines again) with data as red circles. Red plume here is the effective R number (RH scale). R number and case ascertainment are the fundamental parameters that are being fitted in my approach. Infection fatality rate is fixed at 0.75%.

So far, so good. Well, bad, but hopefully you know what I mean.

Another relevant weekly analysis that came out recently is the infection pilot survey from ONS. Up to now it's been pretty flat and inconclusive, with estimates that have wobbled about a little but with no clear signal. This all changed with their latest result, in which the previous estimate of 27,100 cases (uncertainty range 19,300 - 36,700) in the week of 19 - 25 Aug increasing to 39,700 (29,300 - 52,700) in the week 30 Aug - 5 Sept. That is a rise of 46% in 11 days or about 3.5% per day. R is roughly the 5-day growth rate (for this disease), so that corresponds to an R value of 1.2, but note that their analysis doesn't extend over the past week when the cases have increased more sharply. 

Actually, I don't really think the ONS modelling is particularly good - it's a rather arbitrary curve-fitting exercise - but when the data are clear enough it doesn't matter too much. Just looking at the raw data that they kindly make available, they had almost 1 positive test result per 1000 participants over the fortnight 23 Aug - 5 Sept (55 cases in 59k people) which was 65% up on the rate for the previous fortnight of 26 cases in 46k people. Again, that works out at R=1.2.

A rather worse perspective was provided by SAGE, who continue to baffle me with their inability to apply a bit of common sense and update their methods when they repeatedly give results so strikingly at odds with reality. They have finally noted the growth in the epidemic and managed to come up with an estimate marginally greater than 1, but only to the level of R=1.1 with a range of 1-1.2. And even this is a rounding-up of their estimate of daily growth rate of 1 ± 2% per day (which equates more closely to R=1.05 with range of 0.95-1.15). Yes, they really did say that the epidemic might be shrinking by 1% per day, even as cases are soaring and hospital admissions are rising. I do understand how they've managed to generate this answer - some of the estimates that feed into their calculation only use death data, and this is still very flat - but it's such obvious nonsense that they really ought to have pulled their heads out of their arses by now. I sometimes think my work is a bit artificial and removed from practical issues but their unwillingness to bend to reality gives ivory tower academics a bad name.

At the other extreme, a paper claiming R=1.7 was puffed in the press yesterday. It's a large survey from Imperial College, that bastion of incompetent modelling from the start of the epidemic. The 1.7 number comes from the bottom right hand panel in the below plot where they have fitted an exponential through this short subset of the full time series of data. There is of course a lot of uncertainty there. More importantly, it doesn't line up at all with the exponential fitted through the immediately preceding data set, starting at a lower level than the previous curve finishes. While R might not have been constant over this entire time frame, the epidemic has certainly progressed in a continuous manner, which would imply the gap is filled by something like the purple line I've added by hand.

It's obviously stupid to pretend that R was higher than 1 in both of the recent intervals where they made observations, and just happened to briefly drop below 1 exactly in the week where they didn't observe. The sad thing about the way they presented this work to the media is that they've actually done a rather more sensible analysis where they fit the 3rd and 4th intervals simultaneously, which is shown as the green results in the 3rd and 4th panels on the top row of the plots (the green in the 3rd panel is largely overlain by blue which is the fit to 2nd and 3rd intervals, but you can see if you click for a larger view). Which gives.....R=1.3. Who'd have thought it?

Of course R=1.7 is much more headline-grabbing. And it is possible that R has increased towards the end of their experimental period. Rather than fitting simple exponentials (ie fixed R values) to intervals of data, perhaps a more intelligent thing to do would have been to fit an epidemiological model where R is allowed to vary through time. Like I have been doing, for example. I'm available to help and my consultancy rates are very reasonable.

In conclusion, R=1.3(ish) but this is a significant rise on the value it took previously and it might well be heading higher.