Tuesday, March 31, 2020

The new study from IC

I haven't had time to examine this new research in great detail but it looks pretty good to me (maybe one or two minor caveats). They have fitted a fairly simple mechanistic statistical model to time series data for deaths in a large number of European countries, using a Bayesian hierarchical modelling approach to estimate the effects of various "non-pharmaceutical interventions" like shutting schools and banning large meetings which have been widely adopted. I suspect (could be wrong tho) that the reason for using a primarily statistical model is that it's quicker and easier for their method than a fully dynamical model like the one I'm using. They say it is fundamentally similar to SIR though, so I don't think their results should be any the worse for it (though perhaps the E box makes a difference?).

First and foremost, by far the most significant result IMO is that they find the initial R0 to be much larger than in the complex mechanistic modelling of Ferguson et al. This confirms what I suggested a few days ago, which was based on the fact that their value of R0 = 2.4 (ish) was simply not compatible with the rapid exponential growth rate in the UK and just about everywhere else in the western world.

Here are pics of their values for a number of countries, for their preferred reproductive time scale of 6.5 days. Left is prior to interventions, right is current post-intervention estimates.

 


You may spot that all the values on the right imply a substantial probability of R0 > 1, as I also suggested in my recent post. If the initial R0 is high, it's hard to reduce it enough to get it below 1. If true, that would mean ongoing growth of the epidemic, albeit at a lower rate and with a longer lower peak than would be the case without the interventions. I will show over the next few days what the implications of this could be over the longer term.

It's also important to realise that these values on the right depend on the data for all countries - this is the "hierarchical" bit - there is no evidence from the UK data itself of any significant drop in R, as you can work out from the below plot which gives the fit and a one-week forecast. There simply hasn't been long enough since the restrictions were put in place, for anything to feed through into the deaths. Though if it bends in the near future, that will be a strong indication that something is going on. They appear to be 3 days behind reality here - it looks like they haven't got the two small drops and most recent large rise.




Despite my best intentions, I don't really see much to criticise, except that they should have done this a week or two ago before releasing the complex modelling results in which they made such poor parameter choices. Obviously, they would not have been able to estimate the effect of interventions using this method, but they'd have got the initial growth rate right.

It's a little like presenting forecasts from one of the new CMIP6 models, without checking that it has roughly the right warming rate over the last few decades. You'd be better off using an energy balance that was fitted to recent history, especially if you were primarily interested in the broad details such as large-scale temperature trend. In fact that's a pretty direct analogy, except that the epidemic model is much simpler and more easily tuned than a state of the art climate model. Though they also had a bit less time to do it all :-)

As for the caveats however - I had to laugh at this, especially the description above:



I remember seeing a similarly impressive "large correspondence" plotted in a climate science paper a while back. I pressed the authors for the correlation between the two sets of data, and they admitted there wasn't one.

But that's a minor quibble. It's a really interesting piece of work.

I am now just about in a position to show results from my attempts at doing proper MCMC fits to time series data with the SEIR model so it will be interesting to see how the results compare. This can also give longer-term forecasts (under various scenario assumptions of course).

Sunday, March 29, 2020

Is R0 larger than people think?

EDIT: as a quick update to this, I'd just like to point to the new IC report out here, which vindicates what I've written below. Yes, R0 is higher than they thought, and this does mean that control is far harder to achieve (and quite possibly will not be achieved with current policies).

You read it here first...

--------


Is R0 really big?  R0 here being the average number of people that each infected person passes the disease on to, assuming a population with no immunity. The reason I wonder, is that the doubling rate in the death data is well under 3 days - under 2.5 in fact over the full time series from the first death to yesterday's data. There hasn't been a single period of 6 days in which the total number of deaths in the UK has not at least quadrupled, apart from when the govt sneaked in a short 8 hour “day” into the official time series recently. I see this has been adjusted for in some data sets, but not the one I have to hand - it won't matter in the long run of course. 

Incidentally, it should not be forgotten that the official death toll now only counts hospitalised cases. Those who don't get that far, and aren't tested, aren't counted. I'll assume this is a minor factor for now and ignore it completely in this analysis.

Doubling time depends on 3 parameters in the SEIR model - R0, and the latent and infectious periods which I'll abbreviate to L_p and I_p. I_p is the time interval during which ill people are passing on the disease, which can hardly be less than a day. I'll fix it at 2, which is also much shorter than most estimates I've seen. It's important to realise that L_p and I_p are basically biological parameters of the disease. They may vary a bit from place to place, but it's hard to imagine wild variation. R0, on the other hand, is substantially a social construct. It varies with how many people we meet, and how we behave with hygiene and cleanliness. It is R0 we aim to change through social distancing and disease control measures such as quarantining etc. A value of R0 calculated from another country and culture may be rather unsuitable for our situation. The value of R0=2.4 that Ferguson et al preferred for their main calculations (with a range of 2.0-2.6) was from the first few people infected in China, and may be unrepresentative for a number of reasons.

Even with I_p fixed at a very small value, in order to get a short doubling time, you still need either to have a small value for L_p, or a large R0. Most studies put L_p around 4-5 days which would mean R0 has to be pretty huge. You do have to be a little careful in interpreting the epidemiological analyses, as people often define L_p as the time taken to develop symptoms, whereas the model actually only cares about when you start to infect others - and there is evidence that this is a slightly quicker process. But still, a latent period of 3 days would be very short compared to what most people are saying.

If the latent period is 3.5, and the infectious period is 2 - both value which seem to be below the range of widely accepted values - then R0 has to be 2.7 even to get a doubling rate of 3 days, which is a bit slower than the data suggest. R0 = 2.7 is outside the range of values that were even tested by Ferguson et al. If you take a more accurate but pessimistic view that the doubling rate is actually 2.5 (it's actually comfortably quicker even than this in the data!) then R0 has to be 3.2.

Why does this matter? The control, mostly. Ferguson et al assume we can reduce R0 substantially by the extreme control measures that are put in place. Their “suppression” strategy takes a basic value of R0=2.2 and assumes it will be reduced by about 70% though social distancing and all the other measures which are put in place.

Reducing 2.2 by 70% gets it well below 1 and control is rapidly achieved. Reducing 3.2 by 70% just gets to below 1, barely. If R0 is any bigger, then a 70% reduction still leaves it greater than 1 meaning we progress through a full epidemic, albeit a smaller one than we'd have had without the controls.

In the graphs below I've tried a range of combinations with different pairs of R0 and L_p, all of which have exactly the same doubling time of 3 days - at the slow end of what might be compatible with the data, though the quick end of what the Govt is currently planning for. These following simulations are also all tuned to have 1000 deaths at the present date.


With no intervention, the epidemics are identical at first and broadly similar in their entirety, only differing in the point at which herd immunity is reached. We see about 600,000 deaths at a 1% death rate.


If we introduce controls in the middle of March to reduce R0 by 70%, then the second plot results. What a difference! For the low R0 values, the epidemic is rapidly controlled, though the deaths lag this a bit. If R0 is large, even though the growth is completely indistinguishable up to this point, the futures are very different. The growth remains positive for a while, before the epidemic levels out and decays. Note this assumes we keep the control measures in place indefinitely.

Here's a blow-up of the last plot just focussing on the period around the present so you can see the detail. And I will also show daily deaths as this gives a clearer indication of changes.



I suspect this model reacts too sharply to changes in parameters, compared to reality where there are numerous sources of variability that this model does not capture. That tends to smooth things out rather. Nevertheless, it suggests we'll be a few days into April that we could possibly expect to recognise a levelling off in the death rate. A key question will be whether we keep below 1000 deaths per day, which I think would be an optimistic hope at this point. I emphasise that the lower lines depend on really rather extreme values for the time scale parameters of the virus, which in principle I'd think should be reasonably well understood by virologists and the like. The lower two lines have latent periods of just under 2 and 3 days respectively!

The Govt's talk of 20,000 deaths looks to be a bit hopeful to me.


Friday, March 27, 2020

More on parameters and rates

So it seems the Govt got the memo (I'll add a link when I see a write-up of Gove's comments). Though their estimate of 3-4 days for the doubling period is still longer than the data indicate - it was probably under 3 in my estimation, and that was way back in the mists of time when the latest figure for deaths was only 422. Two days on (plus a bit, shifting the reporting deadline from 9am to 5pm so really 56 hours) and it's 759. That's a doubling per 2.75 days unless my maths skillz are all gone to pot.

I hope that they have also realised the consequences for the timing and magnitude of the peak, because it really makes a bigly huge difference in bringing it all upward and forward as I showed yesterday. An awful lot hangs on whether this social distancing thing really works, because if not, it's going to be pretty dreadful. It's already pretty bad, and there are several doublings to go.

One important detail to remember in the parameter estimation/model fitting process is that with 3 parameters all playing a role in determining the net growth rate, we have a highly underdetermined problem when we look at population-level time series data alone, as this is just a simple exponential curve with a particular magnitude and growth rate. If I tell you that AxBxC = 35.4, you are still none the wiser as to what A, B, and C are. The converse of this is that this lack of determinacy isn't a huge problem in the simple simulation of the initial exponential growth curve - all that matters is the growth rate, and the data do constrain that. It does start to matter when you consider the later stages where herd immunity comes into play, or the response to a changed social situation such as controls and shutdowns. These are primarily linked only to R0.

So anyway, my first assumption when wondering what had lead Ferguson et al astray was that they might have used R0 from one epidemiological study and the latent period from another, picking values that were separately plausible but jointly wrong. I was a bit surprised to see the same ref listed for both. It reports a doubling time of 7.4 days for the early stages of the Wuhan outbreak. So, shrug. Either the data there are unreliable or the social situation is sufficiently different as to be inapplicable. In any case, it was a clear mistake to rely so strongly on one small data set. We have an abundance of empirical evidence that the intrinsic doubling time in multiple western societies (prior to any large-scale behavioural changes) is about 3 days, quite possibly even shorter.

A consequence of the parametric indeterminacy is that we don't know if the latent period is short and R0 smallish, or whether both are rather larger. This doesn't matter for the historical curve (which is of course precisely why they are undetermined) but it does when you start to consider interventions. If we assume (ok, it's an assumption, but we have little choice here with this sort of model) that a given social intervention acts so as to halve the amount of contact people have and consequently halve the value of R0, then a change of R0 from 8 to 4 (assuming reproductive time scale of 9 days) is much less effective than a change from 4 to 2 (and a 6 day time scale) or 2 to 1 (with a 3 day generation). The doubling time was initially 3 days in each case, and changes to 4.5 days in the first case, 6 days in the second and the epidemic stops in its tracks for the third. I should probably make a pic of this but I can't be bothered. Use your imagination.

It seems to me that the only way these parameters can be reasonably estimated is from detailed case studies, as in the cited Wuhan study which actually looked at time from infection to symptoms. Though, if the disease is infectious sooner than that (as appears to be the case) this isn't really the right time scale to use anyway. Regardless of the reasons, it's doubling much more rapidly than every 7.4 days.



Thursday, March 26, 2020

Fitting the COVID-19 SEIR model to data, part 2

TL;DR version: when the model is fitted to the data, the doubling time is around 3 days. This is far more rapid than the Imperial College (IC) modelling, and (if correct) could mean that the peak would be upon us faster than they predicted.

A continuation of my investigations with the SEIR model which I have presented in previous posts especially here and here. Use the label for the full set. I'm a little surprised there hasn't been more competent statistical modelling appearing yet. I know there are plenty of extremely talented statisticians around who can do this sort of thing in their sleep, and yet the modelling that has appeared recently seems a bit...limited. The Oxford study was plainly silly (at least in its presentation) and even the IC study which is the direct foundation of Govt policy didn't do any sort of meaningful uncertainty analysis. I'm a bit less embarrassed to show my stuff given the level of what the pros have done.

I am however a little embarrassed that it took me so long to discover a significant issue with the IC study which I will now discuss. A while back, Vallance mentioned a 5 day doubling time (on this youtube video, starting from 27 mins in). And people were puzzled. Where did this come from? The data show a much more rapid growth, eg on this excellent analysis which is regularly updated in the FT:



Most countries are close to doubling every 3 days, which is a much more alarming position.

Well, it turns out that 5 days is baked in to the parameter values of the IC model! The doubling time is a function of R0 and the latent and infectious periods, which they set according to their interpretation of the literature and research. Importantly, through their use of a single data point to calibrate their model, they made no attempt at all to fit the observed growth rate, merely setting the start point of the epidemic in their model at the level that ensures their simulation had the correct number of total deaths by mid-March). So, I am trying to tune my model in more detail in this post to take account of the information we have regarding the growth rate.

I've taken a fairly subjective view, which I believe is necessary given the limitations of both the data and the model. The simple model I'm using cannot be expected match the data precisely but I'm trying to get the general gist of it, and allow a reasonable range of plausible model fits. I have changed my priors from the previous attempt, in order to provide more support for shorter doubling times as this is clearly justified by the international data shown above. I have also done an explicit calculation of deaths (based on a proportion of infected cases dying after a significant time delay) and am using both the death data and the reported number of cases in my model fitting, with the latter adjusted as reasonably as I can do for under-reporting.

With no further ado, here are the results. I'm just using a simple rejection sampling (likelihood weighting) approach, with uncertainties in R0, latent period, death rate and starting size. No real need for anything more sophisticated with only 4 parameters and a quick model. Here is my prior predictive distribution, and posterior, over the initial segment where we have observations, presented in log-space as this changes the exponential growth to linear and clarifies what the model is and isn't doing....

Total (UK) cases first. All ranges presented are 5-95%, so there is a chance of being outside, but not a great one:

I have just used the start and end of each observational time series, there's no point in using all the intermediate data points as they are very close to a straight line anyway. Although, one could probably argue that the growth rate in the very very earliest stage might be different to that generally applicable (eg due to contact tracing and the atypical nature of incoming infections), and I might come back to that in future work. For the number of reported cases, I've assumed that the early numbers were fairly reliable and the later one substantially under-reported along the lines of Kucharski, as I mentioned earlier. I should also mention that I make a further assumption of 50% of cases being symptomless and a death rate of about 1% (also an uncertain parameter in this work) for the symptomatic cases, ie 0.5% of all infections. I do have some uncertainty in this death rate parameter so it could be a bit higher or lower. In the above graph I have plotted the official data as black circles, the red H marks indicate the estimates I'm using in the fit, together with their uncertainties (ie, the uncertainties that I have subjectively attached to them, these are not truly uncertainties of the data as such). There is lots of room to quibble with my judgements!

Total deaths next, and I set the axes to give the same aspect ratio, which is why the labels are a bit odd (ok, laziness played a rôle too):

For these death data, I'm assuming the reported values are themselves reliable (up to the day before yesterday!) so not adjusting them directly. But still allowing for significant mismatch in the fit. 

The fitted parameters suggest a slightly higher R0 of about 2.6 and a latent period of only 2.5 days. These, together with the infectious period which I fixed at 4 days, combine to give a doubling interval of just under 3 days which is even shorter than in my adjusted prior let alone the 5 days of the IC report. Clearly these data (at least, my interpretation of them) point towards a rapid growth rate. You can see in the above graph I'm still not matching the growth rate in deaths which is actually doubling every 2.2 days over this period. These death data seem to be going up rather faster than the reported cases even after I have adjusted the latter for estimated under-reporting, and the model does not achieve this. I don't think this can be a model limitation per se as simple theory implies that all the growth rates of all variables should be the same in this early stage of exponential growth - cases, symptomatic cases, serious illness, death, recoveries. All of them should double at the same rate, and therefore be linked by simple proportionality rules. But in fact the first few deaths were at a rate of 1 per hundred current cases, now it's more like 5. Maybe the under-reporting is worse than I think? Food for thought surely.


Now some projections for the coming summer. Here are the epidemic curves under a business-as-usual course of taking no action. I'm showing first the total number of cases, then the total deaths. In the first of these, I also showed a handful of the individual curves to show the typical shape - as I mentioned previously, it is not the case that we might follow along the lower bound of the 5-95% range, this is instead the envelope within which most of the curves sit. All of them give a large rapid epidemic.


It's over by the end of May in terms of infections - even the start of May in some cases - but the bodies will probably continue to pile up for another few weeks after that:






Remember that this is just assuming a 0.5% death rate, which is not particularly pessimistic. I think the model probably over-predicts the proportion of us that get the disease, but not by a huge margin. And 0.5% might well be optimistic in such a compressed time frame when we don't have capacity to treat people. If you don't like the notation for the axis, it's about 250,000-500,000 deaths.

Now, the obvious question is, what should we do about this? The results of Ferguson et al suggest that changes in social behaviour can reduce R0 significantly, and this is easy to simulate, though the reduction itself is just something I have to impose (based loosely on fits to Ferguson et al).

Here are two possible futures. I've assumed we either achieve (a) a 50% reduction in R0, which is pretty optimistic, or (b) a 70% reduction in R0, which is incredibly optimistic. Both of these are applied for a 3 month period (the shaded region) and then released. The results are.....not really all that great. 





If we suppress strongly now (and then relax the controls), the epidemic comes back for a second go later, rather as Thomas showed. The red curve is slightly better than the others, but the total deaths aren't that different. Of course this analysis completely ignores the possible development of treatment or vaccines that could completely change the picture. Let's hope so cos these pictures are not really very appealing!



I'm sure I could think of more things to do, but this is enough for now. Comments welcome. It's a shockingly compressed time scale compared to what the IC group have predicted - they had critical care demands peaking in June! - so I hope their judgement over parameter choices proves to be more appropriate than what this model fitting achieved.




Wednesday, March 25, 2020

Dominic Cummings

Dominic Cummings is a stupid person's clever person.

That's it. That's the tweet.

The long version is, he's a stupid person's clever person because he has read some popular science and juvenile edgy libertarian blogs and can regurgitate great soundbites that impress people who have no clue what he is talking about, nor how science or technology works. 

Unfortunately, such people are in positions of power, meaning that they listen to him and think he's really great. Your average senior civil servant is also probably quite clever, but a history or PPE degree doesn't equip them to deal with him. I know he's a history graduate himself, and that makes his interests unusual. But there's no evidence that he understands enough about what he speaks (and blogs, at length) to really make use of it. His wish for more technical ability in government is quite possibly correct, but weirdos and misfits aren't the best choices here. These people would need the ability to interface with politicians and the real world beyond the wild imagination of their (typically wrong or impractical) theories. There are very many talented scientist with great technical skills, creativity, originality of thought and social skills to inform, explain, and persuade. I think that most of them would run a mile from the prospect of working in a policy unit in Whitehall. Though there are of course channels for scientific advice to inform on policy.

Actually, I really can't be bothered writing any more about this. Tedious, tedious, tedious. But I wanted to get it out there. If Cummings is the answer, then someone asked the wrong question.

I know I haven't backed up any of this with references. So sue me.

That Oxford study, in full, in brief

This post refers. And this manuscript which has received an unreasonable amount of attention.

Here are a few simulations from the SEIR model I've been using, with different death rates (presented in the pic as death per case), adjusted to give the same cumulative death toll in the early phase of the epidemic. Death is assumed to lag infection by 17 days as in the Oxford study. The model used here is possibly just a bit simpler than theirs in some respects but very similar as to the overall behaviour. I have not tried to calibrate precisely to observed data as I'm just making a simple theoretical point. By shifting the epidemic curve backwards and forwards, I can match the death toll from all of these models up to where it peaks, at which point we are already past the peak of the epidemic. The only thing you can rule out from the initial exponential growth (linear in log space) is that we haven't passed the peak as in the pink curve (perhaps also cyan), or else deaths would already have tailed off.

In reality we have lots of reasons to discount the lowest and highest extreme values, and we have observed a lot more than just death toll. For example, if there really had been a large epidemic by early Feb, the first discovered cases would not all have been clearly linked to foreign travel (and each other). And, contrary to their claim that only 1 in 1000 are seriously ill, there are decent-sized areas in Italy where a higher proportion than this are already dead.  All this research shows is that the death toll doesn't tell us about how big the epidemic is (and therefore how big the mortality rate is) until after we've passed it. Rightly, it is getting a drubbing on Twitter. It's all very well playing mathematical games like this on an idle afternoon but I think it's deeply irresponsible to have pushed this out to the press as it suggests a level of uncertainty and disagreement among experts that simply isn't there.

Tuesday, March 24, 2020

A few more thoughts about parameter estimation and uncertainty in epidemic modelling

Last night's post was a bit rushed and devoid of analysis. I'm not surprised to see there have have been some other recent model-fitting investigations by recognised researchers in the field, including this one. The model they used is in fact marginally simpler than the one I was considering in that it does not have a latent period, but on the other hand they explicitly model death and present their equation. Which is quite handy as I think I should be able to add this to the SEIR model I'm using :-)

Something I should have pointed out yesterday. If all you have is data from the initial exponential part of an epidemic curve, then it's a straight line (in log space) and only constrains two degrees of freedom - a growth rate, and a magnitude. I had uncertain 4 parameters, 3 biological (which jointly determine the rate) and a starting value (ie magnitude). Clearly, my problem is hugely underconstrained by this data and so my priors will have played a large role in determining the results.

Worse, if the data are likely to be under-reported by a large but unknown factor, then we can't constrain the magnitude at all. This is quite likely the case when we look at reported cases of illness, as the most mild cases may never be discovered. In the UK recently, many very likely cases are not tested at all unless the patient becomes seriously ill. If we include in our estimation process an under-reporting factor which is itself unknown, then the magnitude of the epidemic will also be unknown (ok, pedants will note it is bounded below, but probably by an unreasonably low value). This latest paper above and the Ferguson et al research both used deaths to calibrate their models. Death data are probably much more secure than cases of illness, but the relationship between infection and death is again highly uncertain. By assuming a very low death rate, we can estimate the current infection level to be high (so the epidemic is widespread but probably not so harmful), and conversely a high death rate means that current infection level is low and we are in for a bumpy future.

As far as I can see, the Oxford group has basically played this game by introducing a parameter to represent the proportion of the population which is susceptible to a severe form of the disease. This parameter acts as a simple scaling factor in the death equation for their model. When this is set very small, it means the epidemic is very large, which also means it started earlier than thought (though given the nature of exponential growth, not necessarily by an unbelievable margin). When the parameter is large, the epidemic is currently small but will get much more serious.

These seem to be fairly fundamental road-blocks to doing any more detailed parameter estimation, so all results will necessarily be highly dependent on prior assumptions. Even the most elementary model has more degrees of freedom than can be constrained by time series data in the initial phase. It gets better once we are past the peak, as we know the proportion infected is then a substantial part of the population, and this will also help to get a handle on the fatality rate (though the lag means that could take a little longer). But by then it's too late to do anything about it.

Studying sub-populations that have already experienced the disease is one thing that may help. If the death rate is as low as one end of the Oxford paper suggests, how did 7 people die on that cruise ship (out of 700 cases, where there was regular testing and the total number of people was about 3500)? Bad luck, or were they just a particularly unhealthy bunch? And does their fit to Italian data (which implies that the epidemic is basically over) work for Lombardy as opposed to the whole country? Enquiring minds etc...

Edit: OK according to Wikipedia, Lombardy has 60% of the deaths but only 10% of the population. If the Oxford model (with low-death parameter) was fitted to this, I'm sure that it would put them right at the end of their epidemic with people no longer falling ill at much of a rate. However, despite the lockdown having been in effect for a while, there are still a lot of cases being reported. I say that refutes their idea, though it would take a proper calculation to be sure.

A test of Vallance's claim about cases?

Not so long ago, Vallance claimed that there could be about 1,000 cases of COVID-19 for every current death in the UK. Now, the Italian death rate (as a percentage of reported cases) is extremely high, and it might also be high as a percentage of true cases including unreported. But if Vallance's estimate applies here too, the 7000 deaths there so far would imply 7 million cases, a little over 10% of their population. This is just about enough to significantly bend the exponential curve as it implies an actual infection rate of under 90% of R0 (ie, something more like 2.3 new infections per case rather than 2.6).

Of course it would be a bit longer before that fed through into the observed fatality rate and with reported cases clearly under-counting by a huge factor, it's not something we could know about for a while. The slight slowdown in death rate being observed now cannot possibly be evidence of this theory yet.

Kucharski estimates a factor of 20 in the under-counting which would mean about 2% incidence rate, still a couple of doublings away from that point.

Monday, March 23, 2020

Uncertainty in the COVID-19 model

Time for James to Speak to the Nation. Well, Speak to my Reader at least. Rather than subjecting myself to the waffling windbag, here are some fun experiments in parameter estimation that I've been doing.

The setup is very simple. The SEIR model I've been using has 3 main internal parameters, being the reproduction number R0, the latent period, and the infectious period. They can all be estimated to some extent from case studies, but the appropriate numbers to use in a model are not precisely known. Beyond these parameters, the only remaining tunable value is the number of infected people at the start date (this date isn't worth changing itself as you just go up and down the exponential curve). It seems that Ferguson et al picked the biological numbers from the recent literature and then chose the start value to hit a given number of deaths from their simulations on a date in mid-March. No uncertainties are reported in their analysis, though they have used a few different R0 values in some scenarios. So I think it's worth investigating how much the parameters could reasonably vary, and how much difference this might make.

I've just done a very simple tuning, fitting to values at the start of Feb and 20th March. Since the data are essentially exponential there is little point in using more intermediate data. All it can tell me is the rate of growth and its current level.

Priors for the model parameters are chosen rather arbitrarily by me, based on my previous tuning.
R0 ~ N(2.4,0.3)
Latent_p ~ N(4,1)
Infectious_p ~ N(4.5,1)
Start ~ exp(N(-20,1))

All uncertainties quoted at one standard deviation.

Sampling from this prior gives a wide range of possible epidemics, as shown below. The blue plume is the 5-95% range with the median line also drawn. This plume may be a bit misleading as it seems to include the possibility of no epidemic at all! Actually what is happening here is the different parameter sets lead to a different timing of the peak, so at any moment in time quite a lot of them have either not started, or ended. I've plotted a handful of the individual curves as red lines to show that. In fact all parameter sets do lead to a large epidemic, which is an immediate and unavoidable consequence of the R0 value.




Now for the tuning.

For the data....I assume a significant amount of under-reporting in the official case numbers, as discussed by many. Kucharski today suggested the true value (and note this is even for symptomatic cases) was about 16x the reported values, though I'm not sure if this applies from day1. Vallance recently made the heroic claim that the number of cases might be 1000 times the number of deaths at any given time, presumably in a desperate attempt to pretend that the mortality rate is really low as that's what he staked his advice on. I guess if you include newly infected, it's reasonable. My constraint on March 20 (which I'm taking as relating to the total symptomatic and recovered cases) doesn't include his value, but ranges from 22k-162k (median of 60k) on the day when the official number was 4k and deaths were 177. My early Feb constraint is centred on 20 with a range of 7-55. Mostly just because the log of 20 is very close to 3, I guess on reflection that range is probably a bit high but note that it has to account also for further imported cases (which the model does not simulate) rather than just home-grown ones. The initial growth is exponential so it only really makes sense to do these constraints in log space. A little simplistically, I assume uncorrelated errors on these two data points. In reality, the degree of under-reporting might be expected to be somewhat related.

The below plot shows the prior and posterior spread over the initial segment on a log scale. The two red Hs are the constraints I've used with red circles at their means, the dark circles below are the actual reported numbers. My pale blue prior has a broad spread, and the green posterior (just tuned by rejection sampling, nothing fancy) focusses nicely on the plausible range.



And what about the projected epidemic? Here we have it:



Compared to the prior, it's greatly constrained with the timing of the peak ranging from about mid-April to the end of May. Again the lower limit in that plume is made up of the start of late epidemics and end of early ones, it doesn't represent a realistic trajectory in itself.

I think one thing that can probably be concluded is that the parametric uncertainty isn't hugely important for the broad policy advice. Rather like climate science, in fact :-) In the absence of strong action, we are getting an epidemic in the next couple of months, and that's really all we need to know. Though the timing is somewhat uncertain...I may investigate that further...watch this space...

Edit: I realised overnight that the way I have calculated the percentiles was a bit clumsy and misleading and will change for future work. Basically I calculated the percentiles of each model compartment first, with the various diagnostics being generated from these, whereas I should have calculate the diagnostics first, and then presented the percentiles of them. A bit of a silly error really. I'll leave it up as a testament to the limitations of doing stuff quickly.




Sunday, March 22, 2020

Mitigation vs suppression

Next thing to do is look at suppression. Vallance recently made an absolutely terrible comment that I find hard to excuse, describing the difference between mitigation and suppression as "semantic". It is not, the distinction is absolutely fundamental to the overall strategy.

The only excuse I can find for his comment is if he just meant we have to make strong efforts to reduce R0 as much as practicable in either case. But still, mitigation (R0 greater than 1) gives us an epidemic, perhaps slowed to some extent ("flattening the curve") but still spreading through a large part of the population until it burns out through herd immunity. Whereas suppression (R0 less than 1) means we control the epidemic, immediately reducing its spread from the moment controls are introduced (though it may take a week or two to see this clearly in the statistics). This threshold of R0=1 is a clear "tipping point" in the system, much more so than is apparent in just about any climate science I've seen. It is extremely implausible that a mitigation scenario would not exceed the figure of 20,000 deaths that Whitty talked about - it would almost certainly be upwards of 100,000 unless there are serious errors in the understanding of data currently available.

Here is a comparison between mitigation and suppression. At the day indicated by the vertical line, a policy is introduce which changes R0 from the base level of 2.5 (odd final digit just for convenience) to the values shown in the legend. The epidemics play out as shown. You may be a little bit puzzled that the number of visible lines doesn't match the legend.



Let's blow up the plot around the imposition of the controls. Oh there they are! All the R0 < 1 values drop away rapidly after a small delay of a few days. 



This also shows up strongly in the number of total cases over the duration of the epidemic. Mitigation (unless you get very close to 1 indeed) always ends up with a substantial proportion of the population infected. Suppression never does. They are as different as apples and kippers. By the way, the total population here is taken to be 67 million, so the R0=2.5 case gets about 90% of us. The suppression strategies are all well under 1%.



Suppression does need to be continued basically indefinitely however, at least until some external change to the paradigm like vaccines are produced. And it may be impossible to keep R0 under 1 indefinitely, due to economic realities. But at a minimum, it bears carefully looking at. It appears to be the current policy of the govt, but is not compatible with Johnson's claim that this will all be over in 12 weeks. That's what happens when you elect an incompetent buffoon to be PM. He might be able to bluff his way through a late-night essay crisis with a bit of cod Latin (haven't we all done that in our time) but when faced with a technical problem that requires attention to detail he is utterly out of his depth.

Ferguson et al describes some suppression controls (basically the current policy of widespread social distancing, plus case isolation and home quarantining) which together achieve R0 less than 1 in their model. Let's have a look at their figures and try to work out how effective they think these suppression policies should be. Their Fig 4 is the clearest guide - assuming a basic value of R0=2.2, they claim to keep a lid on the epidemic with policies in place for about 2/3rds of the time and off for the remaining 1/3 of the time. 



The plot suggests the policies need to be in place a bit more than that I think, but let's take their word for it. I further assume that the "off" state means no controls at all, ie reverting back to R0=2.2. Back of the envelope calculation suggests that the suppressed state should correspond to R0=sqrt(1/2.2) = 0.67 but that turns out to not quite be adequate and even R0=0.6 gives a very gradual rise over time with the drops seen during 20 days of the controls not quite cancelling out the growth in the uncontrolled 10 day periods. I haven't bothered to implement the thresholding procedure as described in Ferguson et al, just done a straightforward 20 days on, 10 days off to see what happens.


The vertical dashed line is drawn 3 days after the change in policy and roughly coincides with the turning point in the epidemic, showing the lag inherent to the system. I think the lag in this model is unrealistically short due to various reasons which are quite obvious when you look at the equations. It isn't designed to simulate changes over such short time scales as a few days. It would take even longer to observe in reality due to observing, testing and reporting delays. It would probably take at least a week to really see it.

Note that this calculation made the possibly optimistic assumption of R0=2.2. If we revert back to the original R0=2.4, and assume the suppressed state has an equivalently scaled value of 2.4/2.2* 0.6 = 0.65 (just my simple assumption but it seems a reasonable starting point), then in order to keep a lid on things we need to impose the controls for 22 days in every 30 days period and only have 8 days off, which is what is shown below. Or you could perhaps say 3 weeks on, one off. That's 3/4 of the time rather than 2/3rds. On the other hand I'm sure I saw someone estimate R0=0.3 from Wuhan during the lockdown. I can't find a link to that right now though and it did require very draconian control that we probably wouldn't tolerate.
Perhaps it would be more realistic to find a more sustainable approach which just kept moderate control maintaining R0 ~ 1 indefinitely. Even R0=1.3 would be a huge relief and buy us many months according to my top plots. That's one for the behavioural policy experts and epidemiologists jointly to work out. I have to say I'm not optimistic about the chances of the current govt and population collectively being able to control the epidemic indefinitely though I could imagine that widespread cheap and fast testing (for the virus, or better still, for antibodies) would radically change policy options and make control much easier to achieve.


Saturday, March 21, 2020

Calibration and some interventions

In this post I'll try to do some basic calibration of the House model that I played with in the previous post. And then try a small experiment in which I vary the timing of some interventions.

I'm not trying to do anything properly scientific here, at least not yet. However, in the same way that simple climate models can be used for insight and exploration much more effectively than full GCMs, I am hopeful that this simple modelling can be informative (at least to me!) in a similar manner. Rather than trying to use real data, I'm calibrating the model against the Ferguson et al results since they simulate the full epidemic and also describe some of their model parameters.

I start with the base scenario of no interventions. I have tuned the latent and infectious periods a little to match the Ferguson et al comments reasonably, eg they say it takes 5.1 days to go from initial infection to becoming symptomatic, but people are infectious from one day prior to this so I use 4 days as the latent period. They also say their model has an average generational time scale of 6.5 days (I assume this means the time taken to get a factor R0=2.4 increase in cases) which requires a fairly short infectious period of 4.5 days in my model.

The remaining tunable parameter is just the initial state which they say was tuned in their model achieve the right number of deaths by 14 March. This seems an extraordinarily sensitive target (relying as it does on both the infection date and time to death of a mere handful of people with the most severe underlying health conditions) but on the other hand at least it's probably a fairly secure data point. The alternative of targetting reported/estimated infections etc is subject to uncertainty as to what that actually represents and how well it's measured. Anyway, this model does not explicitly simulate deaths but an initial condition of i0=2e-9 in the model on 1st Jan gives reasonable outputs. This mean we have about a quarter of an infected person at the start of Jan, or equivalently 10 infectious cases (plus another 15 exposed, latent) at the start of Feb. While these values may seem just a little, high don't forget that as well as under-reporting, there will have been ongoing import of cases from abroad since that date which this model does not account for. It's not a ridiculous number at any rate.

With these model parameters, we get almost 10,000 infectious cases today (21st March) , or 24,000 including latent cases. In the base uncontrolled case, the epidemic peaks around 20 May in terms of the infectious cases. To enable easier comparison with Ferguson's Figure 1 I have plotted the infectious curve, shifted by 21 days, to roughly simulate the shape (but not the size!) of their death curve. This is based on the values of 5 days from sypmtoms to hospitalisation and a typical stay of 16 days, presented in their report. I'm not expecting close agrementm but hoping to be in the right ballpark. And it doesn't look too far off to me, by eye. The vertical grid lines are drawn at the 20th of each month (which just to be clear start at the appropriately labelled tick marks, ie the month names are on the 1st). It took some effort to produce plots quite as awful as the Ferguson et al report, which might be generated in Excel perhaps?
Above plot is mine, below is Fig 1 from Ferguson et al

I'm actually really impressed by how well this simple model agrees!

Next up, let's explore the interventions, which were applied in the Ferguson et al model (Figure 2) for a period of 3 months from April 20 to July 20. My main reason for doing this is that I was interested in exploring what sort of change to the basic R0 values they represent, as the report didn't mention that. I'm not trying to match each curve precisely but just interested in the general range of values that they represent. Based on muy results, it seems that the more aggressive controls seems to represent quite a change from R0 = 2.4 to perhaps as low as R0 = 1.65 (black curves, various line styles).
It is nnoticeable that the difference between the two lowest curves of Ferguson et al (light brown and blue), which differ only in the addition of social distancing for the over-70s, is primarily in their height rather than shape or timing. I reckon the reason for this is that the R0 value hasn't actually changed much between them, so the overall dynamics of the epidemic hasn't changed, but the relationship between the number of cases and of critical care demands has altered due to fewer cases occurring in this particularly vulnerable older group. Therefore I'm not surprised that I haven't quite managed to match this largest drop. Decreasing R0 still further below 1.65 in the House model delays and spreads the peak in a manner that doesn't match that curve.

This plot also answers a question I was asked on Twitter: what is the effect of bringing forward the start date for these interventions to today, rather than waiting a month as was assumed in the report? The answer turns out to be quite straightforward. Starting a month earlier delays each epidemic, but does not change the overall shape. With hindsight this is obvious though I didn't realise it before doing the simulations. The early intervention just slows the growth over the initial exponential part (compared to the base case), with the magnitude of the delay depending on the magnitude of reduction in R0. Up to almost a month in the case of the strongest interventions. Of course interventions have a cost and I've made no attempt to evaluate this.
Above is mine, below is Ferguson et al Fig 2

Enough for now. A simple message is that starting controls sooner rather than later, delays the epidemic, without doing much else.

Friday, March 20, 2020

Timing is nothing

Recently mathematical epidemiologist Thomas House published a nice blogpost (see link) in which he argued that it was important to time population control measures in epidemics such as the current one. He presented a simple example in which acting too early in the epidemic would give worse results, just like acting too late. He helpfully presented his model code which I've translated into R and will be using here. (I haven't included the code as I can't work out how to do it neatly like he did on this blogging platform.)

My aim here is to critically examine his example. I believe that while it is technically correct, it has the potential to be catastrophically misleading.

His model is basically a SEIR model with the acronym representing susceptible, exposed, infectious, resistant/recovered (sources seem to differ on what the R really represents). Exposed means those who are infected but not yet infectious due to a latent period which is 5 days in this case. Thomas seems to have implemented this model with two consecutive stages for each of the exposed and infectious phases, giving 6 boxes in all. I'm guessing that this is in order to model the delay between the stages better than a simple 4 box scheme would give, but I could well be wrong about that. A detail that most readers probably don't care about anyway. The figures below replicate the ones Thomas plotted, so I've clearly replicated the model adequately.




Thomas' scenario which I have replicated here is where we start with a baseline of an uncontrolled epidemic, and have the option to implement a 3-week lock-down at the time of our choosing, that temporarily reduces the reproductive rate R0 from 2.5 to 0.75. The top row of plots is a zoom into the early stages of the epidemic, the second row is the full thing. Thomas argued that timing the intervention was critical and in particular that it should not be implemented too early, because getting it right is crucial to minimising the peak. See the different lines in the left hand plots and especially that the maximum of the red curve in the lower left hand plot is lower than the others. His claim is true but I would argue that in our current situation it is also potentially misleading to a catastrophic degree.

The next graph demonstrates my point. It shows an estimate of the critical cases arising from the epidemic, which I take to be 2% of infectious cases at any moment in time. This is a very rough estimate, but the value must surely be significantly greater than the overall mortality which is generally expected to be about 1% (when properly treated). The maximum capacity of intensive care beds in the UK is about 5000 (from the Ferguson et al Imperial College report) and I've plotted this as the cyan line.




Oops.

I hope you all see the problem here. The red peaks may be lower than the others, but none of the model results are in the same postcode let alone ballpark as the capacity. If capacity was some 20 times higher (dashed line) then Thomas' analysis would be entirely correct and important. But actually, "let's wait a month or two before we do anything" is in this example a catastrophic policy and although it's still technically true that a greater proportion of cases are properly treated in his scenario (ie the area under the cyan line and also under the red curve), the vast majority are not. In the base case, 7% are treated and this rises to just over 10% in the best case. Well, while those additional 3% of victims would surely appreciate it, it's a bit like taking a band-aid to a trauma ward and fussing over where to optimally apply it.

I believe a more appropriate conclusion to draw from this analysis is that the short sharp shock of a temporary lock-down is basically useless as a strategy, and we need to develop a better solution. My intuition suggests to me that such a better strategy should be implemented as soon as possible rather than waiting for the epidemic to grow first, as I'll investigate in future posts.

This model appears to give somewhat worse results than the Ferguson et al report, which uses a vastly more detailed model with lots of different sub-populations rather than just one homogeneous bunch of people. Their results suggest only exceeding capacity by a factor of about 20 in the case of no action, versus the 40 here (ie the dashed cyan line would have to be twice as high again to cover the blue peak). Maybe they expect a higher proportion of isolated older people to escape the disease. Anyway I do not expect this model to provide precise and accurate answers, but it does look like a broadly reasonable tool.


Wednesday, March 18, 2020

How they got it so wrong - a theory

Think I will restrict the coronavirus stuff to appearing here, it's not really our professional work at blueskiesresearch.org.uk and I only posted there because of the Rmarkdown to Wordpress publishing thing which isn't actually as useful as I'd hoped it would be.

Anyway.

Recent events have got me thinking - how could the Govt have got it so wrong with their COVID-19 strategy? What has changed (if anything) in the science, and their response? And what should they actually be doing? Especially in light of the research published a couple of days ago, which seemed to merely state the obvious. The Lancet editor Richard Horton has been tweeting critically (eg quoted in this post) and wrote this article in the Guardian:
Indeed, it didn’t need this week’s predictions by Imperial College scientists to estimate the impact of the government’s complacent approach. Any numerate school student could make the calculation. With a mortality of 1% among 60% of a population of some 66 million people, the UK could expect almost 400,000 deaths. The huge wave of critically ill patients that would result from this strategy would quickly overwhelm the NHS.
The UK has a long-standing pandemic plan - specifically for flu, but while COVID-19 is technically not flu, it is a respiratory disease with strong similarities. The UK has gone through a number of flu pandemics, as recently as 1957 which many of the older generation (including my mother) remember well. 

It is explicitly baked into the strategy that 
Stopping the spread or introduction of the pandemic virus into the UK is unlikely to be a feasible option 
 (Appendix 1, here)

While the plan does mention as many as 750,000 deaths, it may be that people didn't really think that sort of figure was plausible. A few tens of thousands maybe, a bit worse than a normal bad flu year, but hardly existential. So, the basic paradigm seems to be, the govt only ever considered the possibility of the epidemic running through the population, killing as many as it happened to, according to its specific parameters for spread and lethality. Strategies for controlling the epidemic at a lower level, to eliminate it before it burnt itself out, just weren't in their solution space.

Hence, “taking it on the chin ” and “herd immunity”. That's just what we do, cos that's what we have always done, cos that's how these things work. The pandemic plan considered the necessity of dealing  with the excess deaths, and things like how to reduce the spread to some extent (social distancing and cocooning of particularly vulnerable people are beneficial strategies, albeit with limited effect). But it just never considered whether it might be appropriate to stop the disease entirely. 

But actually, when you do the sums, you see that a death rate of 1% (and I'm getting bored of pointing this out, but this estimate is only valid when the incidence of illness is low and victims get good healthcare!) occurring over a time frame of a couple of months is horrific and will see piles of bodies in mass graves, with the NHS totally overwhelmed in the meantime. And we also see from other countries that an alternative approach is possible, one in which stricter controls on social mixing, combined with more aggressive testing and quarantining of contacts of cases, can actually control the epidemic. The key is in whether we can get R0, the basic reproductive rate of the virus, below 1 or not (and keep it there on a sustainable basis). If we can, then we can beat the disease with only a few hundred or thousand deaths, rather than hundreds of thousands.

So that's my theory as to what “changed”. It wasn't the science of the disease. It was the demonstration of an alternative solution, one in which the epidemic is controlled at the outset. If the epidemic had simply overwhelmed China first, before spreading across the world like a dark cloud, swamping nation after nation as it did so, maybe we would have all accepted that a moderate percentage of the population were going to die, and the rest of us would pick up the pieces afterwards. But we've now seen that this is not inevitable, and it seems awfully defeatist to not even try to do better.

BlueSkiesResearch.org.uk: Snap

A week ago, I blogged the graph on the left. Last night, one of the epidemiological modelling teams advising the Govt published the graph on the right. Billed as "new research", it has formed the basis of the new Govt guidelines on social distancing to deal with the coronavirus pandemic. Apart from the choice of colours, probably the most significant difference appears to be that I explicitly considered the possibility that we might increase our healthcare capacity over time (the green line rising in the left plot), whereas the modellers in Imperial College did not (the red line along near the x-axis).

The contrast between these two sets of projections of the epidemic, and the fatuous “flatten the curve” plots that I discussed here is stark. While my calculation was relatively trivial, it’s good enough to indicate that hoping to “flatten the curve” sufficiently to cope with the full progression of an epidemic is foolish at best.

Its very curious to me that the IC research has been billed as "new science" that justifies a new approach from the Govt. All the underlying data has been known for weeks, months even. The Editor of the Lancet is one of those who has been particularly vocal in pointing this out,
also here
and a horde of twitterati have been saying the same.
It would be interesting to know what has really changed. Is it just that the Govt realised that its genocidal policy of "taking it on the chin" wouldn’t be acceptable once people realised the consequential death toll?

One remaining problem with the IC research is their use of the standard mortality rate of about 1% overall that makes no attempt to account for the obviously deleterious effect of running out of hospital beds. If 15% require intensive care, and it’s not available, it is ludicrous to believe that only the same 1% will die.

Sloppy of journalists to not query this more forensically.