TL;DR version: when the model is fitted to the data, the doubling time is around 3 days. This is far more rapid than the Imperial College (IC) modelling, and (if correct) could mean that the peak would be upon us faster than they predicted.
A continuation of my investigations with the SEIR model which I have presented in previous posts especially
here and
here. Use
the label for the full set. I'm a little surprised there hasn't been more competent statistical modelling appearing yet. I know there are plenty of extremely talented statisticians around who can do this sort of thing in their sleep, and yet the modelling that has appeared recently seems a bit...limited. The
Oxford study was plainly silly (at least in its presentation) and even the IC study which is the direct foundation of Govt policy didn't do any sort of meaningful uncertainty analysis. I'm a bit less embarrassed to show my stuff given the level of what the pros have done.
I am however a little embarrassed that it took me so long to discover a significant issue with the IC study which I will now discuss. A while back, Vallance mentioned a 5 day doubling time (on
this youtube video, starting from 27 mins in). And people were puzzled. Where did this come from? The data show a much more rapid growth, eg on this excellent analysis which is regularly updated
in the FT:
Most countries are close to doubling every 3 days, which is a much more alarming position.
Well, it turns out that 5 days is baked in to the parameter values of the IC model! The doubling time is a function of R0 and the latent and infectious periods, which they set according to their interpretation of the literature and research. Importantly, through their use of a single data point to calibrate their model, they made no attempt at all to fit the observed growth rate, merely setting the start point of the epidemic in their model at the level that ensures their simulation had the correct number of total deaths by mid-March). So, I am trying to tune my model in more detail in this post to take account of the information we have regarding the growth rate.
I've taken a fairly subjective view, which I believe is necessary given the limitations of both the data and the model. The simple model I'm using cannot be expected match the data precisely but I'm trying to get the general gist of it, and allow a reasonable range of plausible model fits. I have changed my priors from the previous attempt, in order to provide more support for shorter doubling times as this is clearly justified by the international data shown above. I have also done an explicit calculation of deaths (based on a proportion of infected cases dying after a significant time delay) and am using both the death data and the reported number of cases in my model fitting, with the latter adjusted as reasonably as I can do for under-reporting.
With no further ado, here are the results. I'm just using a simple rejection sampling (likelihood weighting) approach, with uncertainties in R0, latent period, death rate and starting size. No real need for anything more sophisticated with only 4 parameters and a quick model. Here is my prior predictive distribution, and posterior, over the initial segment where we have observations, presented in log-space as this changes the exponential growth to linear and clarifies what the model is and isn't doing....
Total (UK) cases first. All ranges presented are 5-95%, so there is a chance of being outside, but not a great one:
I have just used the start and end of each observational time series, there's no point in using all the intermediate data points as they are very close to a straight line anyway. Although, one could probably argue that the growth rate in the very very earliest stage might be different to that generally applicable (eg due to contact tracing and the atypical nature of incoming infections), and I might come back to that in future work. For the number of reported cases, I've assumed that the early numbers were fairly reliable and the later one substantially under-reported along the lines of
Kucharski, as I mentioned earlier. I should also mention that I make a further assumption of 50% of cases being symptomless and a death rate of about 1% (also an uncertain parameter in this work)
for the symptomatic cases, ie 0.5% of
all infections. I do have some uncertainty in this death rate parameter so it could be a bit higher or lower. In the above graph I have plotted the official data as black circles, the red H marks indicate the estimates I'm using in the fit, together with their uncertainties (ie, the uncertainties that I have subjectively attached to them, these are not truly uncertainties of the data as such). There is lots of room to quibble with my judgements!
Total deaths next, and I set the axes to give the same aspect ratio, which is why the labels are a bit odd (ok, laziness played a rôle too):
For these death data, I'm assuming the reported values are themselves reliable (up to the day before yesterday!) so not adjusting them directly. But still allowing for significant mismatch in the fit.
The fitted parameters suggest a slightly higher R0 of about 2.6 and a latent period of only 2.5 days. These, together with the infectious period which I fixed at 4 days, combine to give a doubling interval of just under 3 days which is even shorter than in my adjusted prior let alone the 5 days of the IC report. Clearly these data (at least, my interpretation of them) point towards a rapid growth rate. You can see in the above graph I'm still not matching the growth rate in deaths which is actually doubling every 2.2 days over this period. These death data seem to be going up rather faster than the reported cases even after I have adjusted the latter for estimated under-reporting, and the model does not achieve this. I don't think this can be a model limitation per se as simple theory implies that all the growth rates of all variables should be the same in this early stage of exponential growth - cases, symptomatic cases, serious illness, death, recoveries. All of them should double at the same rate, and therefore be linked by simple proportionality rules. But in fact the first few deaths were at a rate of 1 per hundred current cases, now it's more like 5. Maybe the under-reporting is worse than I think? Food for thought surely.
Now some projections for the coming summer. Here are the epidemic curves under a business-as-usual course of taking no action. I'm showing first the total number of cases, then the total deaths. In the first of these, I also showed a handful of the individual curves to show the typical shape - as I mentioned previously, it is not the case that we might follow along the lower bound of the 5-95% range, this is instead the envelope within which most of the curves sit. All of them give a large rapid epidemic.
It's over by the end of May in terms of infections - even the start of May in some cases - but the bodies will probably continue to pile up for another few weeks after that:
Remember that this is just assuming a 0.5% death rate, which is not particularly pessimistic. I think the model probably over-predicts the proportion of us that get the disease, but not by a huge margin. And 0.5% might well be optimistic in such a compressed time frame when we don't have capacity to treat people. If you don't like the notation for the axis, it's about 250,000-500,000 deaths.
Now, the obvious question is, what should we do about this? The results of Ferguson et al suggest that changes in social behaviour can reduce R0 significantly, and this is easy to simulate, though the reduction itself is just something I have to impose (based loosely on fits to Ferguson et al).
Here are two possible futures. I've assumed we either achieve (a) a 50% reduction in R0, which is pretty optimistic, or (b) a 70% reduction in R0, which is incredibly optimistic. Both of these are applied for a 3 month period (the shaded region) and then released. The results are.....not really all that great.
If we suppress strongly now (and then relax the controls), the epidemic comes back for a second go later, rather as
Thomas showed. The red curve is slightly better than the others, but the total deaths aren't that different. Of course this analysis completely ignores the possible development of treatment or vaccines that could completely change the picture. Let's hope so cos these pictures are not really very appealing!
I'm sure I could think of more things to do, but this is enough for now. Comments welcome. It's a shockingly compressed time scale compared to what the IC group have predicted - they had critical care demands peaking in June! - so I hope their judgement over parameter choices proves to be more appropriate than what this model fitting achieved.