Thursday, March 26, 2020

Fitting the COVID-19 SEIR model to data, part 2

TL;DR version: when the model is fitted to the data, the doubling time is around 3 days. This is far more rapid than the Imperial College (IC) modelling, and (if correct) could mean that the peak would be upon us faster than they predicted.

A continuation of my investigations with the SEIR model which I have presented in previous posts especially here and here. Use the label for the full set. I'm a little surprised there hasn't been more competent statistical modelling appearing yet. I know there are plenty of extremely talented statisticians around who can do this sort of thing in their sleep, and yet the modelling that has appeared recently seems a The Oxford study was plainly silly (at least in its presentation) and even the IC study which is the direct foundation of Govt policy didn't do any sort of meaningful uncertainty analysis. I'm a bit less embarrassed to show my stuff given the level of what the pros have done.

I am however a little embarrassed that it took me so long to discover a significant issue with the IC study which I will now discuss. A while back, Vallance mentioned a 5 day doubling time (on this youtube video, starting from 27 mins in). And people were puzzled. Where did this come from? The data show a much more rapid growth, eg on this excellent analysis which is regularly updated in the FT:

Most countries are close to doubling every 3 days, which is a much more alarming position.

Well, it turns out that 5 days is baked in to the parameter values of the IC model! The doubling time is a function of R0 and the latent and infectious periods, which they set according to their interpretation of the literature and research. Importantly, through their use of a single data point to calibrate their model, they made no attempt at all to fit the observed growth rate, merely setting the start point of the epidemic in their model at the level that ensures their simulation had the correct number of total deaths by mid-March). So, I am trying to tune my model in more detail in this post to take account of the information we have regarding the growth rate.

I've taken a fairly subjective view, which I believe is necessary given the limitations of both the data and the model. The simple model I'm using cannot be expected match the data precisely but I'm trying to get the general gist of it, and allow a reasonable range of plausible model fits. I have changed my priors from the previous attempt, in order to provide more support for shorter doubling times as this is clearly justified by the international data shown above. I have also done an explicit calculation of deaths (based on a proportion of infected cases dying after a significant time delay) and am using both the death data and the reported number of cases in my model fitting, with the latter adjusted as reasonably as I can do for under-reporting.

With no further ado, here are the results. I'm just using a simple rejection sampling (likelihood weighting) approach, with uncertainties in R0, latent period, death rate and starting size. No real need for anything more sophisticated with only 4 parameters and a quick model. Here is my prior predictive distribution, and posterior, over the initial segment where we have observations, presented in log-space as this changes the exponential growth to linear and clarifies what the model is and isn't doing....

Total (UK) cases first. All ranges presented are 5-95%, so there is a chance of being outside, but not a great one:

I have just used the start and end of each observational time series, there's no point in using all the intermediate data points as they are very close to a straight line anyway. Although, one could probably argue that the growth rate in the very very earliest stage might be different to that generally applicable (eg due to contact tracing and the atypical nature of incoming infections), and I might come back to that in future work. For the number of reported cases, I've assumed that the early numbers were fairly reliable and the later one substantially under-reported along the lines of Kucharski, as I mentioned earlier. I should also mention that I make a further assumption of 50% of cases being symptomless and a death rate of about 1% (also an uncertain parameter in this work) for the symptomatic cases, ie 0.5% of all infections. I do have some uncertainty in this death rate parameter so it could be a bit higher or lower. In the above graph I have plotted the official data as black circles, the red H marks indicate the estimates I'm using in the fit, together with their uncertainties (ie, the uncertainties that I have subjectively attached to them, these are not truly uncertainties of the data as such). There is lots of room to quibble with my judgements!

Total deaths next, and I set the axes to give the same aspect ratio, which is why the labels are a bit odd (ok, laziness played a rĂ´le too):

For these death data, I'm assuming the reported values are themselves reliable (up to the day before yesterday!) so not adjusting them directly. But still allowing for significant mismatch in the fit. 

The fitted parameters suggest a slightly higher R0 of about 2.6 and a latent period of only 2.5 days. These, together with the infectious period which I fixed at 4 days, combine to give a doubling interval of just under 3 days which is even shorter than in my adjusted prior let alone the 5 days of the IC report. Clearly these data (at least, my interpretation of them) point towards a rapid growth rate. You can see in the above graph I'm still not matching the growth rate in deaths which is actually doubling every 2.2 days over this period. These death data seem to be going up rather faster than the reported cases even after I have adjusted the latter for estimated under-reporting, and the model does not achieve this. I don't think this can be a model limitation per se as simple theory implies that all the growth rates of all variables should be the same in this early stage of exponential growth - cases, symptomatic cases, serious illness, death, recoveries. All of them should double at the same rate, and therefore be linked by simple proportionality rules. But in fact the first few deaths were at a rate of 1 per hundred current cases, now it's more like 5. Maybe the under-reporting is worse than I think? Food for thought surely.

Now some projections for the coming summer. Here are the epidemic curves under a business-as-usual course of taking no action. I'm showing first the total number of cases, then the total deaths. In the first of these, I also showed a handful of the individual curves to show the typical shape - as I mentioned previously, it is not the case that we might follow along the lower bound of the 5-95% range, this is instead the envelope within which most of the curves sit. All of them give a large rapid epidemic.

It's over by the end of May in terms of infections - even the start of May in some cases - but the bodies will probably continue to pile up for another few weeks after that:

Remember that this is just assuming a 0.5% death rate, which is not particularly pessimistic. I think the model probably over-predicts the proportion of us that get the disease, but not by a huge margin. And 0.5% might well be optimistic in such a compressed time frame when we don't have capacity to treat people. If you don't like the notation for the axis, it's about 250,000-500,000 deaths.

Now, the obvious question is, what should we do about this? The results of Ferguson et al suggest that changes in social behaviour can reduce R0 significantly, and this is easy to simulate, though the reduction itself is just something I have to impose (based loosely on fits to Ferguson et al).

Here are two possible futures. I've assumed we either achieve (a) a 50% reduction in R0, which is pretty optimistic, or (b) a 70% reduction in R0, which is incredibly optimistic. Both of these are applied for a 3 month period (the shaded region) and then released. The results are.....not really all that great. 

If we suppress strongly now (and then relax the controls), the epidemic comes back for a second go later, rather as Thomas showed. The red curve is slightly better than the others, but the total deaths aren't that different. Of course this analysis completely ignores the possible development of treatment or vaccines that could completely change the picture. Let's hope so cos these pictures are not really very appealing!

I'm sure I could think of more things to do, but this is enough for now. Comments welcome. It's a shockingly compressed time scale compared to what the IC group have predicted - they had critical care demands peaking in June! - so I hope their judgement over parameter choices proves to be more appropriate than what this model fitting achieved.


William M. Connolley said...

Just checking... these are UK only numbers? So you results show ~300k UK deaths?

James Annan said...

Yes, as a possibility at least. Kucharski suggests 16x under-recording and he's only talking about symptomatic cases, which may be 50% of total. Whitty (or maybe it was Vallance) also talked about 1000 true cases for every current death. It's not an outrageous number. You do know that basically anyone who is ill but not at death's door (and some who are is just told to go home without any testing or treatment.

James Annan said...

Was supposed to be a link to this story

William M. Connolley said...

Well, your *minimum* is ~175k. Which I find surprisingly high.

Neil Edwards said...

Hi James, you should post this on MedRXiv, if you haven’t already, to make sure it’s not just seen and criticised by climatologists
(Neil E)

James Annan said...

Sorry wmc completely misread your previous comment, ignore my earlier reply. 175k deaths, yes, if it's anything around 0.5% of total cases and we'll mostly be infected. What else would you expect? I would love to be wrong but I really don't see much basis for hoping for better. Absent some pretty startling interventions that is. We may invent a cure in time (but time is short). We may also end up with lots of people needing some basic care and getting nothing. This is already happening and we are still very much at the start of things.

I suspect this model infects too many people due to not modelling any sort of spatial or social inhomogeneity. There may be a decent chunk of the population who escape it, significantly larger than the 10-20% that the model permits under a full scale epidemic. But all the evidence points to a very wide spread.

Neil, thanks for the idea. Didn't know that was a thing. I'll look it up. Though I"m supposed to be working :-)

David Young said...

I personally think this whole episode will go down in history as one of the biggest ever examples of mass hysteria. I found an organization that tracks mortality in Europe. They just updated their data for week 12 of this year.

1. There is a strong seasonal cycle in mortality in those over 20 years old. 2016-17 and 2017-18 saw very strong increases in mortality over the winter. So far 2019-2020 is a piker. And there is no evidence I could see that mortality is above the expected numbers over the last few weeks.
2. Italy does show a spike the last few weeks but well below the peak in 2016-17.
3. Actual mortality tends to track the "normal" much better during the summer.

In investigating the flu tracking I discovered that there is a lot of ignorance. CDC projects that in the US there have been between 22,000 and 56,000 deaths so far this season. Those numbers according to someone on my team are based on random testing and followup. Only about 220,000 people have even been tested for flu this season in the US. Likewise the death rate is quite uncertain.

There are also issues with how death certificates are recorded. It's often ambiguous if a death was due to some chronic condition, a virus, bacterial infection, or some combination. If you test positive for corona virus and die, you will be counted as a fatality due to the virus. In the case of the flu, that doesn't happen. For the flu, there is no such presumption. What we do know is that being old is very hazardous to your continuing among the living. More and more people are living with chronic conditions. Many are immune compromised to one degree or another. Virtually everyone will become diabetic before they leave us. Getting old sucks and its not clear if there is a significant increase in mortality due to this new virus.

In short, there is massive ignorance and a panic response that is based on extreme precautionary thinking. Cuomo (New York governor) basically said that if he could save 1 life, everything was worth it. This is childish reasoning.

The media are not helping either. They prominently display fatality rates that are inflated perhaps by orders of magnitude, talk about overloading the medical system without any real knowledge of conditions on the ground, and uncritically trumpet anecdotal reports from a single doctor in New York who claims the situation is dire. This is yellow journalism at its finest.

The problem here is economic health is also critical to physical health. Are there any rational people left?

Everett F Sargent said...


Your numbers are still off by over an order of magnitude. I'm assuming you are getting the dailies from here?
Now, re-rank from max-to-min, there are less then 7 countries to worry about at this point. You should at least do that one and notice the plateaus (cumulative deaths time series). Then plot deaths versus confirmed cases (hint confirmed cases are almost useless due to lack of common spatio-temporal standards).

I don't have a model, and further, I don't want a model, not if their range is like three (or four) orders of magnitude. IC and Oxford are orders of magnitude models IMHO. Now after we get past the real peak, and with much 20/20 hindsight maybe then and only then.

The USA has a current doubling time of 2.5 days (deaths). I do 3, 5, 7 and 9 day centered log-normal regressions, that gives me a temporal daily series of doubling times.

If you really want to screw up the models just modify your Y0 axis values for each nation state. That way you get whatever doubling times per your guessed at y-axis origins. Exponential's are like that don't you know?

The only real tipping point is ventilators and/or ICU beds, that's your only real do nothing alternative.

Of course, my only real assumption is shelter-in-place, I could be wrong, but that is the only really meaningful strategy to date IMHO.

PeteB said...

"But in fact the first few deaths were at a rate of 1 per hundred current cases, now it's more like 5. Maybe the under-reporting is worse than I think? Food for thought surely."

Certainly in the uk, it's only hospitalised cases that are now getting tested and recorded as infections. In the early days there was much more traditional contact tracing / testing, so they would have found a lot bigger proportion of the infections.

Everyone that gets the symptoms at home, the advice is to not even call 111, unless they are so ill they need intervention.

Phil said...

0.5% is beyond reason. South Korea is trending to a case death rate of over 1.5%. This is probably the useful information available not for an estimate, as South Korea is controlling the virus with testing and isolation of only the infected and exposed, rather than the whole population.

Notice that the death rate of closed cases (recovered or death) is 3%, and trending down.

Notice that the current deaths/total cases is almost 1.5%, and trending higher.

When all cases are closed the two numbers will be the same.

Phil said...

DY: The likely death of well over 1% of the world isn't hysteria.

South Korea has a good heath system. Not overloaded. Much of the world may not be so lucky, on one count or the other.

Dr Annan. Maximum infection rate is of 80% is not realistic. People have different behaviors, and the R0 depends on behavior. The social butterflies will mostly all be infected. R0 of a group playing beer pong in a bar is fairly high. People that live alone or with a partner, don't attend large gatherings are fairly unlikely to be infected. R0 is less than 1.0 for a fraction of society.

R0 is not a constant, but an observation. Varies with society and within a society.

James Annan said...

Phil, there is lots of uncertainty over the death rate, due to the uncertainty of asymptomatic/mild/unrecorded cases. My 0.5% is actually 0.5±0.2% at 2 sd so the range does include the possibility of higher values.

TBH it's s a pity people are looking at the deaths, people know they are uncertain, the results generated here can be scaled up and down and don't affect the dynamics.

The 3 day thing however is a strong challenge to existing govt advice and policy. The IC modelling made some very unfortunate choices based on limited evidence and it seems they didn't adequately check the plausibility of their results.

James Annan said...

Phil, I do agree that this model probably over-predicts the penetration of the epidemic somewhat through treating the population as entirely homogeneous. I'm sure I said that previously somewhere. However, my impression is that this doesn't affect the risk by a huge factor. Even with stringent attempts at isolation, the IC model suggests a decent chunk of the elderly will still get it (under the cocooning and herd immunity plan). Maybe half as prevalent as the rest of the population, far from negligible.

Phil said...

South Korea isn't exactly the same as the UK. Average age is slightly higher and as case deaths are a function of age, South Korea should have a slightly higher death rate than the UK. With all things equal, which they are not, of course. The infected group in South Korea is both more female and younger than average for South Korea, due to the population that was in attendance of services at Shincheonji Church of Jesus, the location of the majority of infections to date. Females are less likely to die of many infections due to lower smoking rates and due to better genetic structure. Females have two copies of the X chromosome, where some of the immune system genes are located.

Males are more likely to be in bars playing beer pong. This might be due to the fact that some of the intelligence determining genes are also on the X, explaining why women are smarter than men.

South Korea average age is 41.8
UK average age is 40.5.

Phil said...

Dr Annan

"My 0.5% is actually 0.5±0.2% at 2 sd so the range does include the possibility of higher values. "

Not enough higher. Mostly likely range is 1.5% to 3%. Look at South Korea. Almost all cases found by testing, not by sickness.

Phil said...

The "the cocooning and herd immunity plan" is unforgivable ignorance and stupidity.

Boris has reportedly gotten the virus. Karma has ran over his dogma.

Phil said...

Oh, and beer pong.

James Annan said...

I'm certainly not defending the cocooning and herd immunity plan. I do wonder if we have any better alternative *now* though.

A lot hangs on how effective social distancing is, and what R0 is.

Importantly, R0 is rather poorly identified by the time series data as the growth rate is a function of all 3 of the major parameters in this model. I don't particularly trust the Ferguson assumptions about what social distancing achieves in terms of reducing R0, but have nothing better to go on. It may be really really hard to keep R0 down on a sustainable basis. I just don't know.

Phil said...


"The Hammer and the Dance"

James Annan said...

That's just an incredibly verbose repackaging of the IC work. It doesn't prove that keeping R0 around 1 is plausible in the long term for us. It just assumes it.

I'm not saying we shouldn't try. All delays are useful. I'm just saying I'm not confident we can "win", in that sense.

Phil said...

South Korea has R0 below 1. Number of cases is falling.

A few other states could be listed here.

Hong Kong had R0 below 1 until March 18th or perhaps a bit before that. I'd bet they get back to that.

China has R0 below 1. Information quality is suspect, but there isn't a full outbreak with people dying outside hospitals.

Above are listed in the order of quality of information.

Case tracking and isolation of possible infected is far more sustainable than shelter at home orders. Banning large gatherings and such is very possible.

Likely in a year or two there will be a vaccine, and then we can drop even most of that.

Phil said...

Actually, Hong Kong still might have R0 below 1. Most cases are due to residents returning from abroad.

"The chief executive, Carrie Lam, said in a statement on Wednesday night: “We believe that a large number of Hong Kong citizens will continue to come back [to] Hong Kong in the following weeks because of the worsening situation overseas and hence confirmed cases will inevitably continue to increase.""

HK may tighten up a bit to keep the number of cases low enough.

Case tracking and isolation worked in Ebola outbreaks. And SARs. And ...

James Annan said...

I agree it's theoretically possible. I would say the jury is still out on how sustainable it is, especially with an epidemic that is now far outside the bounds of contact tracing. Some disquieting snippets of information from China suggesting their suppression isn't as complete as claimed. Different social systems will also change R0, as is apparent in Japan, unless they are just completely making things up. We can't just turn Japanese overnight, nor would some of us enjoy it.

As for SARS, ebola - different disease, different approaches may work. Being infectious prior to symptoms makes things hard. Also, being infectious but not seriously ill, so people keep spreading it. And methods of transmission. Each disease seems to be slightly different.

Everett F Sargent said...

Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator days and deaths by US state in the next 4 months

So ~81K deaths by July in the USA alone.

Better hurry up and place your bets on the Roulette Wheel Of Death (RWOD). You could be the lucky wiener, fame and fortune to follow, if you are still alive, that is.

Phil said...

The rich countries of the world only need to sustain suppression until a vaccine is developed.

Even if a vaccine isn't successful, which I hope is unlikely, eventually enough people catch it so rising rates of immunity reduces the measures needed to keep the virus suppressed. Hopefully the young first, as they are much less likely to get seriously ill or die.

Also: Consider that successful suppression puts strong evolutionary pressure on the virus. A strain that has fewer symptoms will be far better at avoiding suppression for longer periods of time. Over many decades, this virus will evolve to become another strain of the common cold. Likely even if we don't manage to suppress it locally. Where do you think the common cold viruses come from?

The poor countries of the world are very unlikely to suppress it. Very Sad image needed here.

China's information isn't highest quality, to be polite. Yet a uncontrolled outbreak would be very hard to hide. I'd suspect they have a situation much like HK or SK, with a small number of local cases and some imported cases. China isn't likely to be doing as well as they claim. Yet I think it is clear that China is likely to have R0 below 1. Or there would be people dying in the street waiting outside the hospitals. Which isn't happening.

David Young said...

I would just point out that claims such as "1% of the world is going to die" are terrible sound bites to base policy on. The question is what will the excess mortality be over "normal" mortality levels. So far, there is no evident excess mortality and indeed there is much less mortality than in 2016-17.

Expected mortality in the UK amoung those 60-69 years of age is 0.99% per year. Amoung those on the Diamond Princess passengers in this age range, there were 201 who tested positive and none have died. Nic Lewis estimates 0.11% mortality from this data taking into account false negative results which apparently are about 30%. Likewise for 70-79 age group (where there were 3 deaths out of 367 positive tests), the mortality implied by the DP data is much lower than the yearly expected UK mortality rate even assuming everyone gets exposed.

Every death is sad, but basing policy on panic and models with high uncertainty and worst case scenarios is dangerous. Right now, the top line mortality data trumpeted in the yellow journalism rags is very strongly biased and unnecessarily frightening. Even Dr. Birx and Fauci have been calling them out for their distortions, which are very blatant and causing harm.

James Annan said...

So, David, thank you for the interesting link to the total mortality. It is interesting that as you say there's an obvious spike in Italy over the last couple of weeks. Bearing in mind that it was increasing exponentially, and that the epidemic was focussed in one small area with about 10% of the national population but 60% of deaths, this data seems to show very clearly that if not controlled, mortality will be hugely above normal. It's not like the flattening off of the figures in Lombardy happened by itself. I'll leave you to work out how that graph would look if the remaining 90% of Italy's population had the same death rate. Then assume it doubles in three days, and doubles again in another three. And again once more. It would of course slow down some time after most of the population had been infected.

James Annan said...

I would however advise you to stop reading the "yellow journalism rags". It sounds like they are bad for your blood pressure. I stick to quality journalism myself.

James Annan said...

Death rate estimates still all over the place, here a post with two apparently credible papers giving widely spaced estimates based on the same Wuhan data set!

Unknown said...

The lower end of the lower estimate of mortality in that Columbia link (on the basis of 10% of cases coming to health services attention, which is consistent with Iceland data), which is 0.24% mortality, and assuming 50% infected, would give 75000 deaths in the UK. Also at least double that number in ICU for a week or so: there are about 4000 ICU beds so that would need to be spread over a year or so (assuming no-one else needed those beds!).

That seems roughly consistent with 'Diamond Princess' data too (~1% deaths in unrepresentatively old population). I think also roughly consistent with (lower bound from) Lombardy, where overall mortality is ~0.1% (very large error bars), and probably at most half were infected.

Phil said...

Wuhan data set is very messy.

Definition of case changed. They were making it up as they went along. Shortages of everything including time.

Who was tested depended on when. Ranged from "ICU admitting only" early to "anyone with a fever" later. At least some with just possible exposure. Also not very well documented.

Both many deaths and many milder cases were not counted. How many of both is mostly guesswork.

Phil said...

Oh, and I forgot to add this:

The first ever test developed in the world wasn't that close to ideal as you might expect. False negative rate was high, perhaps over 50% at first and improving with time to 3% or less. Dr Li Wenliang tested negative several times before testing positive. The test was good enough to help doctors so was very valuable... but the collected data isn't so valuable.

Better testing and better quality data are needed to resolve numbers.