Saturday, July 18, 2020

Mountains and molehills

A couple of weeks ago, I heard about an issue with the way COVID deaths are counted in England. It seems that PHE are going through the lists of people who had died every day, and checking to see if they had previously had a positive COVID test. If so, they are added to the total number of COVID deaths for the day, even if they had long since made a full recovery and were run over by a bus (or died of some other illness).

Clearly this is wrong, and will tend to overstate the number of people killed by the disease. Equally clearly, there aren't many deaths falling into this situation. Take the total number of 300k positive tests, assume this means 300k people (which it doesn't, as many people are tested more than once) and that they have an average remaining life span of 40 years. Then we'd expect to see 20 of them die every day from all causes, implying about this many of the daily "COVID deaths" in the PHE stats are bogus. That took me under 5 mins to work out, so I shrugged and ignored the issue. My number might not be quite right, the 40m remaining years of life thing will depend on the precise age/gender distribution of those who have tested positive but it's hard to see it being too far wrong. In the face of 100 deaths per day, about 20 of them being erroneously counted is not a huge issue though it would become more of a problem as/when the daily death toll shrinks further. It certainly has little bearing on any retrospective analysis of the size of the outbreak so far.

Two weeks later, and Loke (who I now note is who I first learnt about this issue from) and Heneghan write an article covering this issue, and promote it all round the press. I'm sure it is just an unfortunate accident that they make it sound like it's a really big issue that is major factor in explaining why the death toll in England has remained so high, as they are surely competent enough to have reproduced the calculation I presented above. Unsurprisingly, it's been picked up by the denialist wing of the media which is desperately trying to pretend that the response in England has been anything other than absolutely terrible. It's probably worth mentioning that Heneghan has form for minimising the dangers of COVID: in this piece he argues that the fatality rate is down around 0.2% which is far below all credible estimates I've seen and implies that a very large proportion of UK population has had the disease, which is robustly refuted by a hefty pile of evidence. 

Now Hancock has called for an "urgent inquiry" into this and is using it as a excuse to halt publication of the daily statistics. Even though he's a bit dim, it's hard to believe he doesn't have any numerate advisors who could tell him why it's not that big a deal. Indeed PHE quickly put out a rebuttal which supports my analysis - they pointed out that 90% of the COVID deaths occurred within 28 days of diagnosis, and of the remaining 4000, half of them were directly attributed to COVID on the death certificate anyway. Leaving perhaps 2000 bogus deaths which should not have been added. Over a ~100 day period that's pretty much the same as the 20 per day figure I came up with above.

Compare and contrast with the known under-reporting which is clear from the total death statistics and perhaps most stark in care homes, where the total "non-COVID" deaths have a massive bump coincident with the epidemic. We know that patients were pushed out of hospitals into care homes, without any testing or facilities for safe care and treatment, and it's clear that many thousands of these people died without being counted in the COVID statistics. See the huge yellow bump in the official ONS statistics below: 

This miscoding of unrelated deaths is small beer in comparison.

One way of getting the "correct" answer would be to use excess deaths, but that involves a certain amount of statistical work (excess over what, and how is this calculated?) and is not so quick and easy to come by. So I don't know what they will come up with as a solution. Using a cut-off date might be a reasonable solution, perhaps in conjunction with death certificate where it didn't specify COVID as the cause. Ie, cut out those 2k deaths where they both (a) took more than 28d from diagnosis to death and (b) were not directly attributed to COVID by a doctor. That would seem to minimise any errors in a straightforward manner, so probably they will do something more complicated...


crandles said...

Mountains and mole tunnels perhaps?

Presumably the mole tunnel grows larger in size the longer it goes on: For first 28 days there is no difference with Welsh and Scottish 28 day cut off then it starts to be a tiny effect but it grows over time as more people have had a positive test more than 28 days previously.

At some point, some form of review to ensure numbers are appropriate and consistent would be sensible? So better earlier than later even if this issue looks minor at present?

James Annan said...

Oh yes I absolutely agree 100% that they should improve the method. But I wanted to emphasise that this is a small issue for the historical record that in no way invalidates any analyses on the data so far.

It will never be more than 20 cases per day, at least until the number of cases rises substantially.