Thursday, April 02, 2020

What can we learn from Wuhan?

Epidemiology Thursday again, so my boss is allowing me to spend some more time doing calculations. I have set up a proper MCMC sampler to do the parameter estimation more efficiently which is also helpful for churning through a lot of different cases.

Wuhan (Hubei) is a great test case for this sort of modelling and parameter estimation as we have a good long time series, albeit I know people have some doubts about the reliability of their data in current situation but that doesn't really affect what I'm doing. There are doubts about the reliability of all data in this sort of situation!

The main aims in this post are to see:

(A) Can we model the epidemic curve, including the effect of the lockdown?
(B) What parameters do we obtain from fitting the data?
(C) How soon after the lockdown can we estimate its effectiveness, and how do rapidly we learn over time?

C is really the fun one. But starting at the beginning...

The model is the same old SEIR, with uncertainties primarily in R0 during the initial growth, and Rt after the lockdown (that's still the reproductive ratio R but it applies after a given time t). I have to also allow the initial state to vary (it is strongly related to R0) in order that we get the right size epidemic at the right time. Including uncertainty in the time constants of the model doesn't actually make much real difference here, changes in R compensate for changes in the reproductive time period anyway. I've allowed for half a day of uncertainty (at one standard deviation) in the latent period. The other parameters that I could have allowed to vary are held fixed for now, there's no point having too many more degrees of freedom than constraints as it just means the estimation process is a bit less well behaved.

For priors in R0 and Rt, I used N(mean, sd) = N(3,2) and N(1.5,1) respectively - the prior for R0 is very broad and although I do expect the lockdown to have an effect, I don't know how much. Rt could easily be less than 1 in this prior, but it's rather more likely to be greater than this value. I could have implemented this change as a scaling factor on R0 which might have been better, but I'm not altering it now.

I should also mention for completeness that I massaged the Hubei data just a little to remove obvious errors. I could only find daily cumulative data starting on the 23rd Jan, so I had to add a couple of weeks prior to that as the first death was reported on the 9th and there were 17 by the 23rd. I also eliminated some obviously erroneous zeros in the daily death figures by shifting 1/3rd of the deaths from the two immediately adjacent days. I'm working primarily in log-space so zeros are particularly irksome. And anyway they are just missing data, not real days with no deaths (except at the very start).

And without further ado,..the fit to the Hubei data. With lockdown date (when R changes) indicated. And the before/after values shown on the plot. As previously, the plume is 5-95% range, I've put the median as the magenta line and the thin blue ones are just a few random samples from the ensemble. The red circles are the obs after the massaging I just described.

That's pretty amazing IMO for such a simple model to do such a good job. The parameter values seem very reasonable too, compared to the literature. Yes I know there's a bit of extra whitespace at the top of the plot, the reason is to put all of them on the same axes for easy comparison. quickly could we learn about Rt, the value after the lockdown? One week after, perhaps? Here's the result when we use the first 7 days of data after that date. 

It's a big fat nope. The Rt value shown there is simply the prior choice I made (slight variation due to sampling/rounding error, the exact mean in the ensemble here is 1.551!). Therefore this plot shows the range of futures I assumed under my prior assumption, with some trajectories heading up to herd immunity without much hesitation, and others bending down towards suppression very rapidly. 

How about 2 weeks of data? Does that give us a hint? 

No, not really. If you know what is coming, you could claim there's a slight reduction in the growth rate over the last few days of data, and this is reflected in a slightly lower estimate of R0, but it doesn't constrain Rt. In this model. In fact the time scale for Rt to affect deaths in this model does theoretically allow for it to influence the last few data points and thus for these data to help in the estimation of Rt, but they aren't sufficiently strong evidence to really tell us much.

Three weeks post lockdown, however, and there appears to be something going on. Rt is nudged lower than the prior estimate, and its uncertainty is starting to reduce a touch. Even in the worst case of an ongoing epidemic, this is slower and smaller than in the previous results.

4 weeks, and it's clearer, but even then there is still a substantial chance of Rt > 1. 

5 weeks down the line, it is much better bounded:

That was a very unlucky sample line that shoots off upwards. I checked what was going on in case of a bug or lack of convergence and it just happened to lie at the 99.8th percentile of the posterior 5000-member ensemble. But even so, it's not getting to the top of the graph by the end of March.

Usual caveats apply: it's a simple box model, and I've made a few assumptions that might not be fully justified. But overall I'm really impressed with how well it works.


Phil said...

I should put this on an earlier post, but here it is.

R0 is 2.38 (95% CI: 2.04−2.77)

With many cases unreported, the virus appears to be better at spreading than it factually is.

Unknown said...

An Australian paper indicates any lockdown less than 70% efficient is little better than doing nothing. How efficient is your lockdown? -JCH

Bryan Lawrence said...

I vaguely remember a Chinese paper/preprint that had multiple stages of "lockdown", not just the one ... and they thought it was the second or third (can't remember) which actually dragged R below 1 ...

James Annan said...

Thanks Phil, I think that was the source of the IC choice of R0=2.4. Interesting that it's based on early Wuhan data (up to the 23rd Jan only), and looked at infections rather than deaths. It does seem that the deaths showed a steepening after that date which suggests the infections might have been miscounted.

James Annan said...

JCH - well there was that recent report talking of a 73% reduction in contacts. That makes it markedly less efficient than Wuhan but better than nothing. How that translates into R is another matter.

Bryan, yes these things are generally a bit spread out rather than all happening at once - there's a nice figure of this for Europe in the IC report I blogged about recently (which looked at different controls in some detail). So it's perhaps open to debate where best to put my break-point, but 23rd seems to be the most widely accepted date for the main effect. That seems to be when the main quarantine/blockade started and it's the breakpoint used in that Science paper Phil linked to.

jules said...

Bryan - the model is quite simple and a bit differently to reality. When we were messing about with the Lombardy data (hopefully watch this space soon for more on other areas!) there is quite a bit of uncertainty about where to start as the lockdown was quite gradual. It seems to work best if we start the intervention from around the first date of a serious attempt at doing something, and doesn't fit well if we stick the intervention later at the most serious lockdown date. I don't think there is enough information in these data or our model to define a whole series of different Rs. We already have a good fit with just one, so would surely start over-fitting if we added more...

James Annan said...

yes though also I've improved some minor details since then so am hoping it will now work well across different regions without too much fiddling I reckon the death process/time scale is much more realistic now, helped also by the latest IC work with Bayesian estimation.

Unknown said...

Here is preprint: