Thursday, June 04, 2020

Doubling times and the curious case of the dog that didn't bark

I'm sure many of you spend your mornings glued to Parliament TV but for anyone who missed it, there was a House of Lords Science and Technology Select Committee hearing on Tuesday, which discussed the science of Covid-19 with a particular focus on modelling. There are transcripts (two parts) here. and there is a video of the whole event here.

Two things particularly caught my attention. First bit is Question 31 in the transcript about 10:47ish when they are asked about the 20,000 death estimate that Vallance had hoped to undershoot. Matt Keeling first excused this by saying that the lockdown could have been stricter. However, Ferguson had already confirmed that the lockdown was in fact adhered to more strictly than the modellers had anticipated. (Keeling also went on to talk about the failure to protect care homes which is probably fair enough though the 20,000 target would have been badly overshot in any case). It would have surely been appropriate at this point to mention that the 20k target was already missed at the point that the lockdown was imposed on the 23rd March. By this point we had well over a million infected people, which already guaranteed well over 10k deaths, and there was no plausible way of stopping the outbreak dead in its tracks at that point. The decay was always going to be slower than the growth,  meaning more cases and deaths.

Then in the follow-up question, the witnesses were asked what had most surprised them compared to their earlier predictions. Ferguson answered with the biggest factor being that more infection had come from Spain and Italy and this is why they were further ahead on the curve than anticipated! The doubling time has nothing at all to do with how quickly people came with the infection, of course, and getting this wrong made a far greater difference to how things turned out.

It's curious that in two hours of discussion about the modelling, it never once came up that the modellers' estimate of the doubling time was 5-7 days up to the 18th march, and abruptly changed to 3-5 days on the 20th (reaching SAGE on the 23rd, according to the documentation)

Monday, June 01, 2020

The Dim and Dom show

I feel like I should be blogging about something related to the ongoing epidemic, but I can't bring myself to do it. The utterly vacuous, self-destructive, hopelessly incompetent nature of our government is beyond my ability to put into words. I am surprised at the scientists who are still prepared to work with them over the epidemic.

That aside, it's been an interesting couple of weeks. I'd been doing more of the same modelling and forecasting of the epidemic (and have updated our paper and submitted to a real journal), and then suddenly the media got hold of the delayed lockdown story. This is a very simple calculation, initially I thought too trivial to even write into a blog post, but it is of course very eye-catching. After mentions in the Guardian, Telegraph, More or Less, some requests for interviews came in. Initially I ducked them as I didn't really think it was appropriate for a non-expert to be pushing his own research especially as no-one else had backed it up at that point, and ATTP had tried to get results out of the IC model but initially came up with some significantly different answers (after a few more tries at getting the code to do the right things it worked very nicely though). Kit did a very good job on Sky I thought:

and then I found this manuscript (also written by an outsider, mathematical modeller like me) and the research showing essentially the same results for USA (manuscript here) (I think the smaller effect is mostly because they looked at a shorter interval) and also the Sunday Times article which managed to claim it was all new research from the IC team so I relented and did an interview for Vanessa Feltz on Radio 2 (which was live):

and also for the German channel ZDF which was recorded on Friday. Whether it will/did make the cut remains to be seen...the said they would send a link to the final version so I wait with bated breath.

Thursday, May 21, 2020 The EGU review

Well.. that was a very different EGU!

We were supposed to be in Vienna, but that was all cancelled a while back of course. I might have felt sorry for my AirBnB host but despite Austria banning everything they didn’t reply to my communication and refused a refund so when AirBnB eventually (after a lot of ducking and weaving) stepped in and over-ruled them and gave me my money back I didn’t have much sympathy. They weren’t our usual host, who was already full when I booked a bit late this year.

Rather than the easy option of just cancelling the meeting, the EGU decided to put everything on-line. They didn’t arrange videoconferencing sessions – I think this was probably partly due to the short notice, and also to make everything as simple and accessible as possible to people who might not have had great home broadband or the ability to use streaming software – but instead we had on-line chat (typing) sessions with presentation material previously uploaded by authors, that we could refer to as we liked. There was no formal division into posters and oral presentations. Authors could put up whatever they wanted (50MB max) onto the website beforehand and people were free to download and browse through at will. It is all still up there and available to all permanently, and you can comment on individual presentations up to the end of the month (assuming the authors have allowed this, which most seem to). The EGU has posted this blog with statistics of attendance which shows it to have been an impressive success.

Some people put up huge presentations, far more than they would have managed in a 15 minute slot, but most were more reasonable and presented a short summary. We did poster format for ours as we felt that this allowed more space for text explanation and an easier browsing experience than a sequence of slides with bullet points. Unfortunately my personal program of sessions I had decided to attend has been deleted from the system so I can’t review what I saw in much detail. I usually take notes but this time was too busy with computer screens.

Of course, being in Vienna in spirit, I had to have a schnitzel. I might have to have some more in the future, they were rather good and quite easy to make. Pork fillet, not veal.

The 2nd portion at the end of the week was better as I made my own breadcrumbs rather than using up some ancient panko that was skulking in the back of the cupboard. But we ate them too quickly to take pictures! Figlmüller eat your heart out!

The chat sessions were a bit frenetic. Mostly, the convenors invited each author in turn to post a few sentences in summary, following which there was a short Q-and-A free-for all. This only allowed for about 5 mins per presentation, which meant maybe 2 or 3 questions. But this wasn’t quite as bad as it seems since it was easy to scroll through the uploaded material ahead of time and pick out the interesting ones. Questioning could also run over subsequent presentations, it wasn’t too hard to keep track of who was asking what if you made the effort. As usual, there were only handful of interesting presentations per session for me (at most) so it was easy enough to focus on these. It was also possible to be in several different chat sessions at once, which you can’t do so easily with physical presentations! The structure made it more feasible to focus on whatever piqued our interest, and jules in particular spent more time at those sessions she does not usually get around to attending because they are outside of her main focus. Some convenors grouped presentations into themes and discussed 3-5 of them at a time, for longer. Some naughty convenors thought they would be clever and organise videoconferencing sessions outside of the EGU system, which actually worked pretty well in practice for those (probably a large majority to be honest) who could access it, but not so good for those who had access blocked for a number of reasons. Which is probably why the EGU didn’t organise this themselves. Whether it is actually preferable to the on-line chat is a matter of taste.

Jules was co-convening a couple of sessions and the convenors set up a small zoom session on the side to help coordinate, which added to the fun. A bit of personal chat with colleagues is an important aspect of these conferences. Her presentation is here and outlines some early steps in some work we are currently doing – an update to our previous estimate of the LGM climate, which is now getting on for 10 years (and two PMIP/CMIP cycles) old. I think we should probably find it encouraging that the new models don’t seem very different, though it may just mean that they share the same faults! There is some new data, perhaps not as much as we had hoped. And the method itself could do with a little bit of improvement.

I had actually found it a bit difficult to find the right session for my work when originally submitting it. It didn’t seem to quite fit anywhere, but in the end it turned out fine where I put it. The data assimilation stuff was a little less interesting methodologically speaking, perhaps because it’s a sufficiently mature field that everyone is just getting on with the nuts and bolts of doing it rather than inventing new approaches. I did get one idea out of it that I may end up using though, and this from the Japanese looks absolutely incredible from a technological point of view – nowcasting cloudbursts over Tokyo with a 30 second update cycle! With the extra year they’ve now got, it will probably be operational for the Olympics.

Jules and I also co-authored Martin’s work with us on emergent paleoconstraints which we were originally going to present for him as he wasn’t planning to attend. But, with the remote attendance he ended up able to do it himself which was a small bonus.

Best of all – no coffee queues! Well that and not needing to schlep out at 8pm looking for dinner each night…which is fun but gets pretty tiring by the end of the week. On the downside, we had to buy our own lunches rather than gatecrashing freebies all week like we usually (try to) do.

As for the future…well it seems pretty embarrassing that it took current events into forcing the EGU into moving on-line. Some of us have been pushing them on this for years and it’s always been met with “it’s too complicated” by the powers that be. I suspect they mostly like the idea of being in charge of a huge event and enjoy hobnobbing at all the free dinners (don’t we all!) but that doesn’t justify forcing everyone to fly over there and spend at least €2k minimum – probably rather more for most – to take part. It’s a huge amount of time, money, and carbon and we really ought to do better. If one good thing is to come out of the current mess, it might be that people finally wake up to the idea that working remotely really is fully feasible these days with the level of communication technology that is available. Blue Skies Research has been living your future life for more than 5 years now, and it’s great! Roll on next year. I know that turning up has added benefits, and don’t expect all travel to stop. But with remote access, people can easily “go” to both of the AGU and EGU each year, drop in to the bits that interest them, without having to devote a full week and more to each, with huge costs, jet-lag, the carbon budget of a small country, etc.

I expect that the AGU will want to put on a better show this December. Even if travel is opened up by then (which I wouldn’t be confident about at this point) I doubt this will happen quickly enough for the event to be organised in the usual manner. It will be good to have a bit of friendly rivalry to spur things on. In recent years, the AGU has generally been ahead of the EGU in terms of streaming and remote access – last December we watched a couple of live sessions and even asked a question (via text chat) though we were lucky that the small selection of streamed sessions included stuff of interest to us. The EGU has tended to put up streams of just a few of the public debate sessions rather than the science, and this only after the event with no opportunity for direct interaction. Bandwidth is a problem for streaming multiple sessions from the same location, but maybe even an audio stream with downloadable material would work? One thing is for sure, back to “business as usual” is not going to be acceptable now that they’ve shown it can be done differently.

Here’s Karlskirche which I hope to see again in the flesh some time.


Coincidentally, just a few days after the EGU I took part in this one-day webinar. It had a bit of the same sort of stuff – I presented the same work again, anyway! This was a zoom session which worked pretty well, there were one or two technical problems but you usually get in a real conference anyway with people plugging their laptops into the projector. It was great to have people from a range of countries attend and present at what would normally have been a local UK meeting of climathnet people. I have never quite managed to attend any of these before because they always seemed like a long way to travel for a short meeting that mostly isn’t directly relevant to our research. I expect to see a rapid expansion of remote meetings of various types in the future.

Tuesday, May 19, 2020


There I was, thinking I was typing into the void...and it turns out the comment notification had got turned off so I hadn't seen them. As well as lots of unread comments, there were quite a few stuck in moderation (it's off by default, but I think that goes on automatically after a period of time).

I am having a look back but if I've missed anything specific please copy and post again so I notice. For the most part it looks like you've answered each other which is helpful :-)

Monday, May 18, 2020

Strategy for a Pandemic: The UK and COVID-19

Sir Lawrence Freedman (member of the Chilcott Inquiry) has written a review of the UK Govt's response to the coronavirus outbreak which can be found here

He explains his motives thusly:

"The inquiry into the United Kingdom’s role in the 2003 Iraq War, of which I was a member, took the view that when inquiring into a contentious area of policymaking, an essential first step was compiling a reliable account. This should be influenced as little as possible by the benefit of hindsight. This article attempts to provide a preliminary account of the development of UK strategy on COVID-19, from the first evidence of trouble in Wuhan in early January to the announcement of the full lockdown on 23 March. As policy-makers claimed to be ‘following the science’, this requires an analysis of the way that the expert community assessed the new coronavirus, the effects of alternative interventions intended to contain its spread and moderate its impact, and how experts’ findings were fed into the policymaking process. It is preliminary because, while there is good material on both the policy inputs and outputs, the material on how the policy was actually made is more speculative."

It's an interesting read, but while reading it I can't help but think of Orwell's aphorism:
"Who controls the past controls the future."
Here is an interesting snippet in which there seems to be a very clear and perhaps important misunderstanding of the time line. Freedman says on p52:

"By that time, the strategy had already begun to shift. Hours after the COBRA meeting, on the evening of 12 March, SAGE met again to hear from Professor Ferguson on the results of his group’s latest modelling. The conclusions, which were made public on 16 March, were startling. What had made the difference was evidence from Italy suggesting that the R0 was more like 3 than 2.5 and, most importantly, that previous estimates of intensive-care requirements had been optimistic."

The paper itself is of course published and uses an R value of 2.4 in the main analysis of mitigation scenarios, with a range of 2.0-2.6 in sensitivity tests. The Oral hearing of the Science and Technology Committee that Freedman cites as the source of his information took place on the Wednesday 25 March 2020 and can be found here. Ferguson is on at 10:15 onwards, with the relevant comments about R0 right at the end of his segment around 10:55. He says rather disingenuously that the new estimate for R0 of around 3 is "within the wide range of values" that had been considered by modelling groups. Certainly not his, and when you take the doubling time into account, it is very much at the edge of Kucharski's work too. 

I think Ferguson is on very dodgy ground indeed in so blithely dismissing this discrepancy in front of the Select Committee as it is critical to the question of how soon and how aggressively we needed to deal with the epidemic. Note that the doubling time (which is what really matters here) depends not only on R0 but also the reproductive time scale of the virus). In fact, as I have documented previously, the SPI-M advice specifically pointed to a 5-7 day doubling time as late as the 18th March at which point they were considering a lockdown for London (only). It was only at the meeting of the 23rd, long after the 12 March date that Freedman refers to, that SAGE learnt of the change of the estimate to 3-5 day doubling, and the lockdown was ordered that same evening. I am no friend of the Tories and there are lots of things they did badly, but specifically in terms of reacting to the abruptly and radically updated scientific advice, their response seems exemplary here.

Also, on p58:
"Given the known sequence for infection, incubation, hospitalisation and death, it is reasonable to conclude that changes in behaviour were having an effect well before 23 March, especially in London."

This may be possible but does not seem necessary. I'm not just drawing on my own modelling here, Flaxman et al consider all the interventions and also find that the lockdown had by far the largest effect on the epidemic with the other earlier interventions being very minor influences in comparison. Their latest estimate shows R0 dropping from 3.9 to about 3.5 during the week prior, then collapsing to about 0.7 on the 23rd, very similar to my own estimate. (as I've discussed before, their sightly larger initial and lower current values for R can probably be attributed to a longer serial interval of 6.5d in their model compared to about 5.5d in mine). Here are both of our latest results, mine as the top plot and theirs in the following two:

Freedman's rosy assessment from p57 onwards of the NHS coping may not be shared by all, particularly the large number of victims who were shut out by the NHS and sent out into the community to die in care homes while infecting many others, with both NHS and care home staff also inadequately protected. If the NHS really had capacity, why did this happen? I know he refers to this subsequently, but doesn't seem to make the connection. "Coping" by refusing treatment to large numbers of sick and dying people isn't really coping, is it?

Anyway, it's an interesting read.

Friday, May 15, 2020 Why can't the Germans be more like us?

Germany locked down at about the same time as the UK. Actually probably a couple of days earlier, according to Wikipedia and Flaxman et al. Picking a single date is a bit subjective really, but for the purposes of this post I’ll choose the 21st March. So why are they doing so much better than the UK? Well, the main reason is just that they were at a much earlier stage in their epidemic. On the 23rd, the UK had had 508 deaths. On the 21st, Germany was at 47. So that’s a factor of about 10. They were about 9 or 10 days behind us in terms of where they were on the upslope. 10 days ahead of our lockdown, Vallance was saying we had to be careful of not clamping down too soon. What would have happened if the Germans had waited a week?

This is of course quite an easy calculation to do. I can fit the model as before, and then run a simulation with the lockdown date delayed a week. Here it is, looking roughly similar to the post I just did for the UK, except this time the blue line goes higher due to locking down later. Sorry for over-writing the lockdown dates. Never mind. They are 21 and 28 March.

Playing the same game as before, sliding the blue graph along by a week (backwards this time) and then dividing it by 4…and it hits the magenta one again, and even has the same small mismatch due to a hint of herd immunity at the right hand edge. Once again, the doubling time I get from the fit is 3.5 days so delaying by a week would have quadrupled the death toll. These total death tolls are the integral all the way down the slope off the end of the graph, by the way, which is why the total both here and in the previous post is a bit higher than the current number.

It would still have been a little bit less than ours, but it would have been close. Good to know we can still beat the Germans at something.

As for why our total is only about 5x the German one rather than the factor of 10 that we had on our lockdown day...mostly just the random deviations from the exponential slope at the start. By early April (and therefore too soon for it to have been a result of the policy) the ratio in death totals was only 6, and it's stayed close to that ever since. Their lockdown also seems to have been a bit more effective, in terms of the estimated Rt value. Probably they didn't have the same care home fiasco which is currently fuelling our outbreak.

Wednesday, May 13, 2020 The human cost of delaying lockdown

A while ago, I mentioned that the cost of delaying lockdown by a week was to increase illness and death by a factor of 5, based on the doubling time of 3 days that the virus seemed to have at the start.
It’s a simple result but quite striking and perhaps counterintuitive, so here it is in more detail (and with slightly different numbers).

I’ve been fitting the SEIR epidemic model to the daily death data, and here is the latest hot-off-the-press version.

The magenta line is the median of my model fit, and the red circles are the data, though I have smoothed them a little to reduce the huge weekly cycle in reporting (Sun/Mon are always really low, then Tuesday really high).

This model allows the reproduction number to change at the lockdown date, and estimates the two values (which I call R0 and Rt) by fitting to the data. Taking that central magenta estimate, it is easy to re-run the model assuming the same change happened a week earlier. And this is what we get:

Magenta is as above, and blue is what happens if I make the change in R one week earlier, on the date of the blue vertical line. How did I know it would cause such a large reduction in deaths? The doubling time in the early phase is 3.5 days here (not 3 days as I got previously, told you the numbers were slightly different). So the size of the epidemic on this new lockdown date is exactly 1/4 the size it was on the later date. And the behaviour of exponentials (both growing and declining) is such that every day before or after the lockdown, the total size in the hypothetical case is also 1/4 what it was the same number of days before or after the lockdown in reality. The next plot shows this more clearly. I have just shifted the blue line forward by 1 week to make the lockdown dates coincide.

See the same shape, just lower? The logarithmic y-axis that I’m using means that a constant vertical distance between the solid blue and magenta lines corresponds to a constant ratio in numerical values, of 4 in this case. So the total number of deaths is also smaller by a factor of 4. The dashed blue line is the same model output as the solid blue line, only I’ve multiplied it by 4. You can see it overlies the original magenta almost exactly. Just towards the right hand edge of the graph there is a small mismatch, which is due to the magenta case benefiting from a slightly enhanced decline from a hint of the “herd immunity” phenomenon. That is to say, a with roughly 10% of the population having suffered from the disease in that scenario, these people (assumed to be immune) reduce the spread of the disease just enough for the lines to look a little bit different.

So, with these numbers that represent an initial doubling time of 3.5 days, we see that implementing the lockdown one week earlier would have saved about 30,000 lives in the current wave (based on official numbers, which are themselves a substantial underestimate). It would also have made for a shorter, cheaper, less damaging lockdown in economic terms. And this is all quite simple maths that every single modeller involved in SAGE was fully aware of at the time.

Tuesday, May 12, 2020 What can we learn about the COVID fatality rate from Guayas?

Guayas is a region in Ecuador that has had a particularly tough time with COVID-19. Prompted by this Twitter post from Karsten Haustein I have done a bit of modelling…
The daily death totals are available from here from where I could also work out that the typical background mortality was about 60 per day. They hit a peak of over 700(!) and the total excess deaths looks like about 12k in a few weeks (out of 4.5 million, that’s over 0.25%). So that in itself puts a lower bound of that magnitude on the “infection fatality rate” or IFR in that region. (I think Karsten’s number on the tweet could be open to misinterpretation, the excess being half their annual mortality is true enough, but they are a young growing region so it’s a lower percentage of total population than 300k would be in the UK.)

But maybe modelling could shed a little more light?

It was a bit of a challenge to get the model to work well on this data set and I had to tweak it a bit. Most importantly, the time to death distribution in the model seems too broad and flat. I had to sharpen it significantly to be able to reproduce the peak. This seems intuitively reasonable as I’m sure they didn’t have thousands of people kept alive on banks of ventilators, but on the other hand I have no rigorous basis for this modification so the post is a bit handwavy. I think it’s reasonable but different choices might have resulted in different results. I also changed the way I am handling model error a bit as I wanted to really explore how well I could fit the data. There was a bit of tweaking involved to make it work ok. My aim was to see if I could fit the data with a range of different IFR values, and perhaps infer what values might be compatible with the data.

Without further ado, some results. I fixed the IFR at various different values as shown (just by using a really narrow prior centred on those values).

Using 0.2%, it’s a horrible fit that massively underestimates the peak. Not surprising really, given that 0.27% of the whole population died. In fact despite the tight prior, it refuses to stick at that value and drifts up to 0.21%. Even the simulation with 0.3% is not great. The logarithmic scale of the plot flatters it a bit, and it stays comfortably under the peak for quite some time. Interestingly, Rt is estimated to be significantly over 1 during the declines here (despite the attempt at control), because the herd immunity is sufficiently high to play a significant role in squashing the epidemic. IFR=0.4% gives an entirely satisfactory simulation, indistinguishable from the IFR=1% case (at which herd immunity ceases to be a significant factor). The only tell-tale difference is in the Rt values obtained. I suspect that the 0.5 value on the 1% plot is a bit optimistic, we haven’t managed that anywhere in Europe despite being much richer and not having been completely overwhelmed by the epidemic to anything like the same extent. On the other hand we all did better than Rt=0.94.

Splitting the difference, IFR =0.7% is visually indistinguishable again, with Rt = 0.58 being a bit less optimistic than the 1% run. This value for IFR (well, 0.75%) is what I’ve been using in all of my modelling.

Ecuador is a very young country compared to the UK (which would point to a lower IFR) but also much poorer and obviously healthcare was completely overwhelmed with this epidemic (which would point to a higher one). Do these factors cancel, compared to the UK? I have no idea, but I would think that epidemiological modellers might be able to draw more concrete conclusions than I am prepared to do.

If I could find daily data from Bergamo, Italy, I could play the same game there. Lombardy as a whole is only at the 0.14% fatality level which I think won’t be enough to be useful in the same way.

Wednesday, April 22, 2020 Bayesian deconstruction of climate sensitivity estimates using simple models: implicit priors and the confusion of the inverse

It wasn’t really my intention, but somehow we never came up with a proper title so now we’re stuck with it!

This paper was born out of our long visit to Hamburg a few years ago, from some discussions relating to estimates of climate sensitivity. It was observed that there were two distinct ways of analysing the temperature trend over the 20th century: you could either (a): take an estimate of forced temperature change, and an estimate of the net forcing (accounting for ocean heat uptake) and divide one by the other, like Gregory et al, or else (b): use an explicitly Bayesian method in which you start with a prior over sensitivity (and an estimate of the forcing change), perform an energy balance calculation and update according to how well the calculation agrees with the observed warming, like this paper (though that one uses a slightly more complex model and obs – in principle the same model and obs could have been used though).

These give slightly different answers, raising the question of (a) why? and (b) is there a way of doing the first one that makes it look like a Bayesian calculation?

This is closely related to an issue that Nic Lewis once discussed many years ago with reference to the IPCC AR4, but that never got written up AFAIK and is a bit lost in the weeds. If you look carefully, you can see a clue in the caption to Figure 1, Box 10.1 in the AR4 where it says of the studies:
some are shown for different prior distributions than in the original studies
Anyway, there is a broader story to tell, because this issue also pops up in other areas including our own paleoclimate research (slaps wrist). The basic point we try to argue in the manuscript is that when a temperature (change) is observed, it can usually be assumed to be the result of a measurement equation like:

TO = TT + e        (1)

where TO is the numerical value observed, TT is the actual true value, and e is an observational error which we assume to be drawn from a known distribution, probably Gaussian N(0,σ2) though it doesn’t have to be. The critical point is that this equation automatically describes a likelihood P(TO|TT) and not a probability distribution P(TT|TO), and we claim that when researchers interpret a temperature estimate directly as a probability distribution in that second way they are probably committing a simple error known as “confusion of the inverse” which is incredibly common and often not hugely important but which can and should be avoided when trying to do proper probabilistic calculations.

Going back to equation (1), you may think it can be rewritten as

TT = TO  + e       (2)

(since -e and e have the same distribution) but this is not the same thing at all because all these terms are random variables and e is actually independent of TT, not TO.

Further, we show that in committing the confusion of the inverse fallacy, researchers can be viewed as implicitly assuming a particular prior for the sensitivity, which probably isn’t the prior they would have chosen had they thought about it more explicitly.

The manuscript had a surprisingly (to me) challenging time in review, with one reviewer in particular taking exception to it. I encourage you to read their review(s) if you are interested. We struggled to understand their comments initially, but think their main point was that when a researcher writes down a pdf for TT such as N(TO2) it was a bit presumptuous of us to claim they had made an elementary error in logical reasoning when they might in fact have been making a carefully considered Bayesian estimate taking account of all their uncertainties.

While I think in theory it’s possible that they could be right in some cases, I am confident that in practice they are wrong in the vast majority of cases including all the situations under consideration in our manuscript. For starters, if their scenario was indeed the case, the question would not have arisen in the first place as all the researchers working on these problems would already have understood fully what they did and why. And one of the other cases in the manuscript was based on our own previous work, where I’m pretty confident in remembering correctly that we did this wrong 🙂 But readers can make up their own minds as to how generally applicable it is. It’s an idea, not a law.

Our overall recommendation is that people should always try to take the approach of the Bayesian calculation, as this makes all their assumptions explicit. It would have been a bit embarrassing if it had been rejected, because a large and wildly exciting manuscript which makes extensive use of this idea has just been (re-)submitted somewhere else today.  Watch this space!

I think this is also notable as the first time we’ve actually paid paper charges in the past few years – on previous occasions we have sometimes pleaded poverty, but now we’ve had a couple of contracts that no long really applies. Should get it free really as a reward for all the editing work – especially by jules!

Tuesday, April 21, 2020 “5-day doubling” and the great COVID-19 uncalibrated modelling fiasco

(Small edit made 21 Apr to add a corollary at the bottom of the post.)

I’ve said bits of this in various places and at various times to various people, but I don’t think I have written it down in a complete and coherent form. Actually, bits and pieces of the story have come to me at different times and in different ways, so perhaps I didn’t have a coherent story before. Anyway, it seems important and I don’t want it to get whitewashed from history, so here goes.

The story possibly starts on the 12th March, when Vallance stated that we were 4 weeks behind Italy. And also, quite specifically the peak was 3 months away:
For the UK, the peak is expected to fall in three months' time, likely in the summer months, and tail off throughout the autumn, the government said. Vallance said that the UK is around four weeks behind Italy
It’s fair to say the “4 weeks” comment was met with a bit of scepticism by the general public, eg here. And here. When the Govt’s Chief Scientist is being openly mocked for his comments, it seems to me that something is seriously wrong. For context, on the 12th March we’d had about 500 cases and 8 deaths. 15 days earlier on the 26 Feb, Italy had had very similar numbers – in fact slightly fewer cases and more deaths. In both countries, the numbers of cases and deaths were doubling roughly every 3 days, meaning we would get to Italy’s then current values of 20,000 cases and 144 deaths in about a fortnight or so (5 doublings = 32x). 4 weeks was obviously risible.

Then a few days later the 16th March, Vallance talked specifically about a 5 day doubling time (on this youtube video, starting from 27 mins in). And people were puzzled. 5 day doubling would indeed put us about 4 weeks behind Italy (ie the required 5-and-a-bit doublings would take about 26-27 days), but Italy wasn’t doubling every 5 days, and neither were we. We were both doubling on a 3 day time scale instead, possibly quicker than that.

It was actually jules who cottoned on to this first. She had been looking at the numbers more than me, and working out the doubling rate. At this point I was more thinking about the govt’s strategy to fill the streets with bodies under their “herd immunity” plan. It seemed very clear that the weight of critically ill people was going to be a huge burden that the NHS would have no possibility of treating, and my first blog post (which didn’t even have a proper model in, just some curves) focussed on that particular detail.

Anyway, 5 day doubling. Where did this come from? Took me a little while to work it out. It wasn’t until I got hold of the SEIR model and started playing around with it that it started to come together. Ferguson had posted a paper on the 16th March that outlined his modelling. Although his model is of course far more detailed than the SEIR model I was using, it described the parameters in enough detail to emulated rather well by my simpler model. And….the doubling rate he had used was 5 days. You don’t need to do too much digging – or have a great deal of expert knowledge – to find it in the paper:
a 6.5-day mean generation time. Based on fits to the early growth-rate of the epidemic in Wuhan, we make a baseline assumption that R0=2.4 but examine values between 2.0 and 2.6.
What this means is, the number of cases grows by a factor of 2.4 in 6.5 days. Which is equivalent to doubling in 5.1 days. They just imposed that – admittedly, the parameters were estimated from the Wuhan outbreak, but this result came a very small data set very early on. It is also well known that the basic reproductive rate R0 depends on the social context and it’s far from certain that it would transfer directly from the Chinese denizens of a wet market to the population of the UK. To some extent, the effective duration of the period in which people pass on the infection could vary in practice vary too, depending on whether people go to bed or go to work etc. So there is simply no way that putting in the first estimate for Chinese parameters (with a modest uncertainty range) could be considered a robust and reliable modelling strategy, especially since there was already strong evidence that these values were not appropriate even for the later growth of the Wuhan outbreak let alone closer to home. There was ample evidence from other European countries that their doubling rates were far faster than 5 days, and growing evidence from the UK that we were following the same path.

I did a bit more playing around with my model, including parameter estimation, and quickly came to the realisation that R0 had to be rather larger than Ferguson had said.

I emailed Neil Ferguson about this on the 27th, and also CCed David Spiegelhalter, on the basis that as a public-facing stats expert with a strong interest in health he’d get what I was talking about and realise it was important..and, well, they did reply which was more than I was really expecting, but only to politely brush me off. Prof Ferguson did at least acknowledge that he now thought a higher value of R0 in the range of 2.8-3 was appropriate. And true enough, the very day I emailed them, Govey had talked of a 3-4 day doubling. But that requires a rather larger R0 in the range of of 3 to 4 (assuming the same 6.5 day time scale), and is still a bit slower than the data were consistently showing. Later research from IC with a simpler model pointed to a value of around 4.

As for why this matters, here are results from two model simulations. One of them – the uncalibrated one – is very close to what the IC group showed to the Government. The other one is what you get if you calibrate the model to the growth rate shown in the UK data that was available at that time.

For the red line, I used Ferguson’s model parameters and initialised the model as he described in his paper, timing the epidemic so that it had the correct number of total deaths (21) up to 14 March. For the blue one, I first fitted the parameters to the time series of cases reported in the UK, which were probably reasonably reliable up to that time as they were still tracing contacts and testing etc. Similar parameters would have been obtained from fitting to Italy, Spain and the Wuhan outbreak. I then initialised the simulation as for the red curve (daily deaths on 14th are slightly different but the integral up to that date is the same).

Want to guess which one is closer to the future observations? Well, you don’t have to. The initialisation leaves the blue line about a day or two behind reality (only!) but tracking it at the same rate. The red line…just…well. No comment. The logarithmic axis really helps to hide how far away from reality it is.
And as for why this really mattered…the red curve below was how the Ferguson et al model predicted the epidemic was going to pan out. A couple of months to the peak in infections and deaths following almost a month after that. Terrible, but still a little way away, and and Vallance was saying we mustn’t suppress the epidemic too quickly.

However, in reality we were on the blue curve. A peak of over 3 million new cases per day was less than a month away. Well over 20k deaths per day at the start of May. And the govt was just shilly-shallying around.
The big puzzle for me in all this is, why on earth didn’t Ferguson calibrate his model to the 3-day doubling exponential growth rate that was clearly visible in the data? Ok, perhaps I’m a bit biased due to model calibration being basically what I have spent the last couple of decades on, but it’s a pretty fundamental component of forecasting in every field of science that you can think of. Apart from this one, it seems. Every weather forecast is generated by a model that’s tuned to look like reality, both in terms of parameters (as part of its design and commissioning) and also initialised to look like today’s weather. The epidemiologists did the latter ok – choosing a start date to fit their epidemic to start about the right time – but never bothered to tune their parameters to match the growth rate.

It will, I suspect, forever remain a mystery as to why this happened.

A small corollary of the above, added on 21 Apr: It is very straightforward to calculate the effect of a delay to the lockdown. A week of unchecked growth at 3-day doubling corresponds to a factor of 5, meaning that 80% of the total size of the first wave we are currently in could be directly attributed to the Govt delaying by a week, if it was felt that the evidence could and should have supported action that much sooner (ie, when most of the rest of Europe was taking action). That means 80% of the peak strain on the NHS, 80% of total cases and also 80% of all resulting deaths. What this calculation doesn't account for, is what happens in the longer term. We may all get it in the longer term anyway (well 60%+ of us). But we might not, and even so, the huge peak was 5x bigger than it would have been if controlled just a week quicker.

Wednesday, April 15, 2020 Model calibration, nowcasting, and operational prediction of the COVID-19 pandemic

Title as post. Yes, this is us dipping our toes into epidemiology. Turns out that calibrating a simple model with observational data is much the same whether it’s paleoclimate or epidemics. The maths and the methods are much the same. In fact this one is a particularly easy one as the model is embarrassingly linear (once you take the logarithm of the time series). I’ve been posting my analyses on Twitter and the other blog, but since this is a real paper with words and figures and references and stuff, it can go here too (plus, I can upload a pdf here unlike blogspot).

We have been doing a very straightforward MCMC calibration of a simple SEIR model (equivalent of energy balance box model in climate science, pretty much). The basic concept is to use the model to invert the time series of reported deaths back through the time series of underlying infections in order to discover the model parameters such as the famous reproductive rate R. It’s actually rather simple and I am still bemused by the fact that none of the experts (in the UK at least) are doing this. I mean what on earth are mathematical epidemiologists actually for, if not this sort of thing? They should have been all over this like a rash. The exponential trend in the data is a key diagnostic of the epidemic and the experts didn’t even bother with the most elementary calibration of this in their predictions that our entire policy is based on. It’s absolutely nuts. It’s as if someone ran a simulation with a climate model and presented the prediction without any basic check on whether it reproduced the recent warming. You’d get laughed out of the room if you tried that at any conference I was attending. By me if no-one else (no, really, I wouldn’t be the only one).

Anyway, the basic result is that the method works like a charm and we can reliably deduce the changes in R due to imposed controls, and it looks increasing clear that it’s been less than 1 in the UK for several weeks now, while the experts are still talking about the peak being a couple of weeks away. The whole experience is just…so strange.

Anyway, I did try talking politely to some of the experts but just got brushed off which may partly explain the tone in the manuscript. Or maybe that’s just me 🙂

The paper has been submitted to medrxiv but who knows what they will make of it. My experiences when I have poked my nose into other peoples’ fields has not usually be a very encouraging one so I’m half expecting them to reject it anyway. So be it.

Here is today’s forecast to encourage you to read the paper.


Sunday, April 12, 2020

Reporting delays etc

Made this GIF overnight. It's showing how my forecast evolves as more data is added to the system. Initially it doesn't assume any control will be imposed, and then at the appropriate day (23rd March) it assumes a change in R (value to be estimated) which feeds through into deaths after a couple of weeks according to the model dynamics.

But something about it bugged me. Why did the model fail in mid-late March? If it's supposed to be a decent forecasting system, it should be predicting where the future data will lie. The late March data were not affected by the lockdown, it's just that the model is overestimating the pace of growth from early data. I played around with a few ideas and really didn't manage to fix it. I did, however, notice that the data jumped sharply from the 13th to 14th of March, from 1-2 deaths per day, to 10+ deaths per day, without a single day in between. This is actually pretty implausible from a statistical point of view if the underlying death rate is growing steadily in an exponential way, as theory and practice expects.

So I went and had a more careful look at the data. 

These data are actually not, as people might assume, the number of people who have died on a given day. They are actually the number of reports collected in a given 24h period, which may represent deaths on any previous time. I already knew this and also knew that it shouldn't affect the growth rate estimate, so long as the delays are fairly consistent over time. This can be checked with an alternative data set in which deaths are actually collated by true date of death rather than date of report, and this is what the next plot does:

Oops I didn't label the axes properly. Deaths vs dates. The red/pink circles are the same as the previous data, that is to say, the number in the daily report that features on the news each day. The blue/cyan triangles, on the other hand, are the deaths that are known to have actually occurred on on each day. The first thing to note is that the blue/cyan points are for the most part higher, and that's despite these numbers only relating to England, so the UK as a whole will be another 10-20% higher still. There is a drop-off towards the present day where the totals are probably not yet complete and some to-be-added deaths will turn out to have occurred on these days. This is specifically warned about in the data. Ignoring these points, the slopes of the two data set are strikingly similar just as theory expects (which is good, as I was relying on this for my analyses to make sense).These are the dotted pink and cyan lines, with the extent of the lines both showing the points I used to derive them. So far, so good.

So now look at the initial segment where I have drawn a couple of bolder lines in red and dark blue. These are linear fits to the darker blue and red data points, and their slopes are quite different. The blue one agrees withe the cyan - I also extended this with the thin solid cyan line and it's coincidentally (confusingly) identical to the dotted one. The red one, on the other hand, extends as the solid pink line and clearly misses the future data. Just as my slightly more complex model fit did. You can also see that the blue dots show no fewer than 5 days actually had 3-9 deaths inclusive, despite there being none of these in the red data. My fit using the full model is not actually just a linear regression but it's rather similar, and creates the same effect.

My conclusion is that the reason my prediction struggles here is the red data were just particularly poor at this point, and this wasn't due either to bad luck with the randomness of death, or that the model doesn't represent the underlying dynamics well (the blue data are perfectly linear), but instead almost certainly due to some weird reporting errors being significantly larger than I had allowed for in my estimation. Because the chance of not getting a single day in that range of intermediate values is extremely low in a world where we had roughly a whole week with the expected death rate in that range. I don't know if they actually changed the system around that time or not - might do a bit more digging.

Friday, April 10, 2020


A couple of weeks ago, commenter David Young was scathing of the threat of COVID-19. Look at the EUROMOMO web page, he said. 
"there is no evidence I could see that mortality is above the expected numbers over the last few weeks."
"Italy does show a spike the last few weeks but well below the peak in 2016-17."
Moreover, this is all backed up by impeccable analysis from Nic Lewis and we know how good he is. he has looked at the Diamond Princess which had only 8 deaths out of 700 cases, mostly old, and  without a single death in the 60-69 age bracket, he has shown the fatality rate is negligible.

Well, let's see how this stacks up a full 2 weeks later.

Week 14 in EUROMOMO is now out. Click for full size.

They've literally had to rescale their axes for multiple countries to keep the death numbers on the plot. Belgium, France, Italy, Spain, Switzerland and England are all at or above their highs from the cold 2016/17 spell. And this is after a few weeks of lockdown that has stopped the death rates from zooming up higher.

And as for the Diamond Princess, there are actually 12 deaths now from the 700 cases, including one in the 60-69 age range of 200 cases. An empirical fatality rate of 0.5%. According to David, Nic said the death rate was 0.11% for this age group, but what's a factor of 4 between friends?

I expect David will appear shortly and acknowledge that he was wrong and that Nic massively underestimated the death rate and that in fact mortality across much of Europe is at an extremely high level despite strong attempts to suppress the epidemic. I can see his flying pig coming in to land right now..

Thursday, April 09, 2020

Updated method and forecasts

I've tweaked the method just a little since my earlier forecasts. 

You may have noticed that the initialisation of my short-term forecasts was fairly poor. This is an inevitable consequence of naively fitting parameters in a simple imperfect deterministic model using a long time series. It gives good overall parameter estimates but the inevitable limitations of the model mean that the final state (from which we look ahead in a forecast) isn't truly a realistic estimate. This can easily be seen in my very first forecast for the UK for example. The spread of model trajectories is quite wide and biased high compared to the last few data points:

A consequence is that even if the trend is perfect, the one-day forecast will be quite poor. We weren't really going to jump from the observed 621 to 1900 deaths in a single day for example, but the day-ahead forecast spread reached that high a value. Various fudges can be used to address this problem, using anomalies is one but that's problematic during exponential growth and won't satisfy the nonlinear model dynamics. The route I have chosen is to weight the more recent data more highly in the fit. The justification for this is that it is a reasonable approximation to allowing the model to forget older data as would happen automatically if I used a stochastic model with sequential updating, or otherwise represented dynamical model error in a more explicit manner. It clearly works at a heuristic level and I'm working on how best to calibrate the effect and implement it in a more technically justifiable manner. I suppose I may end up publishing this somewhere, so I ought to make sure it is defensible.

Last week I looked at Lombardy, and argued that the lockdown was probably working. Here is a re-run of my forecast, withholding the last 7 data points, using a version of the updated method.

You see those cyan flashing circles? They are the 7 latest data points, which were not used in any way in this estimation. Promise!

For giggles, have a look at what IC forecast for Italy last week, and how well that turned out. Each of their forecasts run from Monday to Sunday, the last 4 weeks of experimental forecasts have obs plotted alongside (under!) them and the pink plume is what they forecast just a few days ago for this week. That last one might be just about ok.

 Here is what my method predicted for all of Italy starting from the same Sunday as their most recent failed forecast, again with validation data plotted:

I must admit I am impressed as to how well it has worked, though it should be noted that at the heart of it, it's basically a 3 parameter fit for a model that generates two straight lines (in log-space) with a bend joining them so not a terribly difficult problem. If you know the shape of the bend and how it is formulated, there's a good chance you can predict roughly where and when it's going to stop bending. The remarkable thing is that the experts at IC haven't managed to achieve this. I find it baffling that they are quite so hopeless. There are about 50 or 60 of them co-authoring that report, and they do it for a living. 

Here is another example, my forecast for Spain, initialised a week last Sunday, versus the IC group forecast at the same date (the rightmost greenish plume with dots below was initialised at the same date).

And here is the current UK forecast...before today's figure comes out.

This is the IC forecast for the UK for this week again (pink plume again, below). The data were already outside their range by yesterday. What on earth were they thinking?