Tuesday, March 24, 2020

A few more thoughts about parameter estimation and uncertainty in epidemic modelling

Last night's post was a bit rushed and devoid of analysis. I'm not surprised to see there have have been some other recent model-fitting investigations by recognised researchers in the field, including this one. The model they used is in fact marginally simpler than the one I was considering in that it does not have a latent period, but on the other hand they explicitly model death and present their equation. Which is quite handy as I think I should be able to add this to the SEIR model I'm using :-)

Something I should have pointed out yesterday. If all you have is data from the initial exponential part of an epidemic curve, then it's a straight line (in log space) and only constrains two degrees of freedom - a growth rate, and a magnitude. I had uncertain 4 parameters, 3 biological (which jointly determine the rate) and a starting value (ie magnitude). Clearly, my problem is hugely underconstrained by this data and so my priors will have played a large role in determining the results.

Worse, if the data are likely to be under-reported by a large but unknown factor, then we can't constrain the magnitude at all. This is quite likely the case when we look at reported cases of illness, as the most mild cases may never be discovered. In the UK recently, many very likely cases are not tested at all unless the patient becomes seriously ill. If we include in our estimation process an under-reporting factor which is itself unknown, then the magnitude of the epidemic will also be unknown (ok, pedants will note it is bounded below, but probably by an unreasonably low value). This latest paper above and the Ferguson et al research both used deaths to calibrate their models. Death data are probably much more secure than cases of illness, but the relationship between infection and death is again highly uncertain. By assuming a very low death rate, we can estimate the current infection level to be high (so the epidemic is widespread but probably not so harmful), and conversely a high death rate means that current infection level is low and we are in for a bumpy future.

As far as I can see, the Oxford group has basically played this game by introducing a parameter to represent the proportion of the population which is susceptible to a severe form of the disease. This parameter acts as a simple scaling factor in the death equation for their model. When this is set very small, it means the epidemic is very large, which also means it started earlier than thought (though given the nature of exponential growth, not necessarily by an unbelievable margin). When the parameter is large, the epidemic is currently small but will get much more serious.

These seem to be fairly fundamental road-blocks to doing any more detailed parameter estimation, so all results will necessarily be highly dependent on prior assumptions. Even the most elementary model has more degrees of freedom than can be constrained by time series data in the initial phase. It gets better once we are past the peak, as we know the proportion infected is then a substantial part of the population, and this will also help to get a handle on the fatality rate (though the lag means that could take a little longer). But by then it's too late to do anything about it.

Studying sub-populations that have already experienced the disease is one thing that may help. If the death rate is as low as one end of the Oxford paper suggests, how did 7 people die on that cruise ship (out of 700 cases, where there was regular testing and the total number of people was about 3500)? Bad luck, or were they just a particularly unhealthy bunch? And does their fit to Italian data (which implies that the epidemic is basically over) work for Lombardy as opposed to the whole country? Enquiring minds etc...

Edit: OK according to Wikipedia, Lombardy has 60% of the deaths but only 10% of the population. If the Oxford model (with low-death parameter) was fitted to this, I'm sure that it would put them right at the end of their epidemic with people no longer falling ill at much of a rate. However, despite the lockdown having been in effect for a while, there are still a lot of cases being reported. I say that refutes their idea, though it would take a proper calculation to be sure.

6 comments:

...and Then There's Physics said...

Doesn't the South Korean data largely contradict the assumption in the Oxford paper? As far as I can see, they assume (as I think you say) that only a small fraction of the population are likely to get a severe case and that a fraction of this population will then die. Hence, when you fit to the current data, you get a large fraction that must already be infected.

From what I found, South Korea reported just over 9000 cases with 120 deaths. If the result in the Oxford paper is correct, then it would seem to suggest that many more were infected than reported. However, my understanding was that the testing in South Korea was quite thorough, so this seems rather unlikely.

James Annan said...

Well you can always say "aha - lots of people were so mildly affected they weren't tested". But yes I'm sure you are right. On reflection I'm a bit surprised at the naivety of the work - it seems hardly any more advanced than what I was doing, despite the eminence of the team. But I suppose it is worthwhile to see what range of parameters can be compatible with the data they considered, rather than just assumign 1% like so many others. But still they could have considered more challenging data!

...and Then There's Physics said...

Actually, I'm not sure the South Korea data is inconsistent. They had 120 deaths. If you consider the extreme scenario in the Oxford paper (rho = 0.001) this would suggest about 1.2 million were infected, which is just over 2% of the population. They tested 300000 people and found just over 9000 were infected, which is about 3%.

Would seem to then depend on how they were doing the testing. I did see something on Twitter, though, which suggested that they'd identified most of the cases, which would then seem inconsistent with the results in the Oxford paper.

Also, presumably one way to check this would be to test a random sample of the population. If the Oxford paper is right, then a large fraction should be infected.

James Annan said...

Well they do suggest antibody testing as a way to check. But I really find it hard to believe we are anywhere close to their high scenarios. There would have been lots of unrelated cases of local transmission far earlier than was actually found in reality. Their analysis doesn't consider this factor at all.

James Annan said...

Here's a more detailed explanation of why the high Oxford number is not credible.

It is an interesting and important fact that during the exponential rise, the proportion ill at any given time is a large fraction of the total who have ever been ill (40% in my modelling, the exact fraction depends on parameters) so in order to get to 50% having had the disease, there would have been a point where about 20% had it at a moment in time (now!). The test stats only ever found under 10% positive even though they were preferentially testing suspect cases.

steven said...

"Actually, I'm not sure the South Korea data is inconsistent. They had 120 deaths. If you consider the extreme scenario in the Oxford paper (rho = 0.001) this would suggest about 1.2 million were infected, which is just over 2% of the population. They tested 300000 people and found just over 9000 were infected, which is about 3%.

Would seem to then depend on how they were doing the testing. I did see something on Twitter, though, which suggested that they'd identified most of the cases, which would then seem inconsistent with the results in the Oxford paper."

South Korea data, cause I live there. (deaths are up to 126)

Traced: 357,896
Positive 9,137
negative 334,481
Pending: 14,278

Positive rate ~2.6 %

Testing is done based on contacts and at risk facilities.
Example:
"From the call center building in Guro-gu, Seoul, no additional cases were confirmed. The current total is 158 confirmed cases since 8 March. Of the 158 confirmed cases, 97 are persons who worked in the building (11th floor = 94; 10th floor = 2; 9th floor = 1), and 61 are their contacts. The KCDC shared the interim result of their epidemiological investigation in collaboration with Seoul City, Incheon City, and Gyeonggi Province during the monitoring period of 9-22 March. The call center on the 11th floor had the highest infection rate (43.5%), compared to 7.5% and 0.5% for 10th and 9th floors, respectively. There was no confirmed case from other floors. Of the 226 persons identified as family members of the 97 confirmed cases who worked in the building, 34 (15.0%) were infected. Of the 97 confirmed cases, 8 (8.2%) were asymptomatic cases. Of the 16 persons identified as family members of the 8 asymptomatic confirmed cases, no confirmed case was found."

Example:
In Daegu, testing has been completed for every person at high-risk facilities. Of the 32,990 test results, 224 (0.7%) were positive results.


Hopefully they will start to Post Positive rates of INBOUND travellers which are driving
our positives every day. 51 positives from all travellers landing
Inbound travellers are now all tested and quarantined.
Same in China.