## Wednesday, April 14, 2021

### Speed vs power in Zwift

This may be of interest to a relatively small number of readers, but it seems worth documenting that the relationship between power and equilibrium flat speed in the cycling simulator Zwift can be quite accurately summarised via

P =  1.86e-02 w.v - 5.37e-04 v^3 + 2.23e-05 w.v^3 + 1.33e-05 h.v^3

where v is the velocity in kph, w is the rider weight in kg, and h is the rider height in cm.

The linear term in v can be thought of as arising through rolling resistance (which also varies with w), with the three cubic terms due to air resistance. These cubic terms can be thought of as the dominant terms in a Taylor series expansion of a single term that looks like A.f(w,h).v^3 where f is a function of weight and height that modifies the resistance (eg though changing the cross sectional area). At first I was trying to work out what f was, but an important realisation that only came to me while doing this analysis is that I don't actually need to know its form as the values of w and h only deviate moderately from their mean values for the practical range of riders I'm interested in (ie ± 10% the most part, 20% at worst). Therefore this linearisation approach (with coefficients fitted through linear regression) is plenty good enough and I don't need any of my model-fitting tricks. More engineering than science but nevertheless useful!

To do the model fitting, I did a bunch of flying laps of the volcano circuit at constant power, with different physical parameters and varying power level each time. This route is fairly flat but not perfectly so, which means the average speed here will be a little bit lower than that achieved on truly flat ground, but probably typical of many flattish routes on Zwift such as Watopia's Waistband or Greater London Flat. I estimate the elevation/disequilibrium effect here to be around 0.5kph, so speeds achieved on Tempus Fugit may be about that much quicker than indicated here (or conversely, you'll hit a target pace with a bit less power than this formula suggests). Some of the riders in my data set are real, others imaginary. I've focussed mostly on women, first because I've been DS for my wife's team for a while, and also because through being a large reasonably fit man I can generate their racing power fairly comfortably for long enough to get a fix on their speeds. (Yes, I know there are software approaches to simulating the power. But it's something else to set up, and I don't really want to get into the world of power bots, you never know where it might lead...) Calculating the power needed for a large rider at high speed requires a bit of an extrapolation and may get less reliable. Bike is the Tron, I started out testing different bikes (to check on what zwiftinsider says) but the differences were too trivial to pursue. Specifically, the Canyon Aeroad 2021 with Zipp 808 wheels which I used to use a lot was just one second slower than the Tron. That's 0.1kph, equivalent to less than 2W.

The black lines in the plot below are the model predictions for each rider, with the crosses marking the data points that I used to fit the model. Each line has 3 data points except for the top one which is my own physics. If someone wants to do a flying lap of the volcano at 450W (using my physical parameters) I'd love to know the result :-) The rest are mostly based around a women's team with jules being the bottom line. Few cyclists of either gender lie outside the range of our parameters! The model-data residuals are about 1.5W on average (RMS error) which is basically the magnitude of measurement error on the speed which is only precise to 0.1kph. This level of precision is plenty good enough for practical use, it's difficult to hit a power target more closely than about 5W anyway.

A conclusion that may be drawn is that for a medium-sized cyclist riding around 42kph, an extra 1kg of weight requires 2.5W more power to maintain the same speed (or alternatively, 1kg less saves 2.5W of power). For an additional 1cm of height, it's around 1W. These numbers aren't far from what I'd estimated through experience, it's nice to have them confirmed in a more careful calculation.

## Tuesday, March 23, 2021

### History in the (re)making?

One year on and there's been a slew of articles revisiting the events of the past year. I was going to ask what has prompted this little flurry, but it's obviously the anniversary thing. With increasing pressure for a public inquiry, it seems that some of the key players have been trying to position themselves favourably, so let's have a look at what's been written, versus what the contemporaneous documentation actually says. SAGE minutes can be found here, I think (I downloaded the relevant docs a while back).

The first article I noticed was Laura Kuenssberg's “Inside Story”. She “talked to more than 20 of the people who made the life and death decisions on Covid”. The relevant passage that I am interested in concerns the decision making around mid-March:

“13 March, the government's Scientific Advisory Group for Emergencies (Sage) committee concluded the virus was spreading faster than thought.

But it was Downing Street "modellers in the building", according to one current official, who pored again over the numbers, and realised the timetable that had only just been announced was likely to result in disaster.

The next morning, a small group of key staff got together. Simple graphs were drawn on a whiteboard and the prime minister was confronted with the stark prediction that the plan he had just announced would result in the NHS collapsing under the sheer number of cases.

Several of those present tell me that was the moment Mr Johnson realised the urgency - that the official assumptions about the speed of the spread of this new disease had been wrong.

[...]

On 16 March, the public were told to stop all unnecessary social contact and to work at home if possible.

[...]

For many inside government, the pace of change that week was staggering - but others remain frustrated the government machine, in their view, had failed to move quickly enough.”

The narrative being presented here of ponderous government is significantly misleading.

The govt claimed at the time to be paying close attention to the scientific advice from SAGE, and the specific change to SAGE's assessment on the 13th March was not that the disease was spreading any more rapidly, but merely that the number of infections was higher than previously thought (due to greater importation from abroad). This is a key distinction that anyone numerate should be able to grasp readily. To quote from SAGE minutes on the 13th:

“Owing to a 5-7 day lag in data provision for modelling, SAGE now believes there are more cases in the UK than SAGE previously expected at this point, and we may therefore be further ahead on the epidemic curve, but the UK remains on broadly the same epidemic trajectory and time to peak.
[...]
SAGE was unanimous that measures seeking to completely suppress spread of Covid- 19 will cause a second peak.”
Changing the estimate of the number of cases just brings the peak forward by a few days. Even a factor of 2 is only a single doubling time which they thought to be about 5-7 days at that time. Changing the estimate of the growth rate could (and in fact did) change the timetable and urgency much more significantly, but this didn't happen for another week and a half.

It is not clear who “the modellers in the building” refers to in Kuenssberg's piece, but they are clearly not SAGE. Maybe Cummings had run a few numbers on a spreadsheet but since SAGE was supposed to be an assembly of world-leading experts, it would hardly be appropriate to discard their analyses in favour of his. For that matter, I had also blogged that the mitigation plan was likely to overwhelm the NHS (a conclusion that I reached around the 9th March based on some very simple calculations) but I wouldn't expect Johnson to listen to me either. SAGE minutes are very clear that they still believed the doubling rate to be 5-7 days right up to the 18th March and had described any overload on the NHS as being some way off (albeit a looming problem that would need addressing at some time in the future). They were unanimously (see above) opposed to suppression at this point.

On the 16th, the SAGE meeting changed its advice somewhat and suggested that some social distancing measures (but not school closures) should be implemented promptly:

“SAGE advises that there is clear evidence to support additional social distancing measures be introduced as soon as possible.
[...]

SAGE will further review at its next meeting whether, in the light of new data, school closures may also be required to prevent NHS capacity being exceeded.”
Clearly there was some increased urgency here but NOT any indication that the NHS was under immediate threat, in direct contradiction to Kuenssberg's unattributed claim above that “the prime minister was confronted with the stark prediction that the plan he had just announced would result in the NHS collapsing under the sheer number of cases.” I'm not saying it is impossible that anyone said such a thing, but if they did, they were an isolated voice and certainly not representative of SAGE as a whole.

Immediately following the SAGE meeting on the 16th, the Govt did of course request that people avoid all unnecessary social contact. Admittedly, this instruction had neither legal force nor economic support at that point but SAGE was obviously reasonably satisfied with the adequacy of this plan as can be seen from their minutes of the 18th (at which time they also recommended school closures):
“SAGE advises that the measures already announced should have a significant effect, provided compliance rates are good and in line with the assumptions. Additional measures will be needed if compliance rates are low.”
So it was only in the case of poor compliance that additional measures would be required.

There was no SAGE meeting between 18th and 23rd, which was unfortunate in the circumstances (21-22 being a weekend). On the 23rd, SAGE finally realised that they had got the R number wrong and that as a result the doubling time was much shorter than had been previously believed, making the situation quite desperate. Specifically, the SAGE meeting of the 23rd concluded: “Case numbers could exceed NHS capacity within the next 10 days on the current trajectory” and this statement must be understood in the context of the immediately preceding 20th March SPI-M meeting which noted both: “Any measures enacted would take 2-3 weeks to have an impact on ICU admissions” and also: “If the higher reproduction number is representative of the longer term, then it is likely that additional measures will be required to bring it below one”.

Thus SAGE's underestimate of the R number didn't just mean that the epidemic was coming faster and harder than previously thought: another consequence is that actions that would have been adequate for R=2.4, might not be adequate for R=3. It is quite understandable that this caused alarm within SAGE, but it only happened on the 23rd.

The Govt imposed a legally-enforceable lockdown with much more far-reaching restrictions immediately that evening (23rd March).

Moving on to the next article, in the Guardian, a hagiography of Patrick Vallance:

“But it now seems clear that Boris Johnson, and his advisers, were slow to heed Vallance’s early advice.

Before the 16 March press conference, Vallance chaired a meeting of the Scientific Advisory Group for Emergencies (Sage) in which a collection of experts had advised that the first lockdown should begin immediately.

Johnson did not announce the unprecedented national lockdown until a week later on 23 March in a primetime TV address to the nation.”

This is simply not true as documented above. SAGE asked for relatively modest action around the 16-18th, and the Govt responded promptly. SAGE explicitly assessed on the 18th that the actions were probably adequate and it was only on the 23rd when they realised that they had got the doubling time wrong, that they suddenly realised they had a much larger and more urgent problem on their hands. Vallance also got this wrong in his appearance before the House of Commons Select Committee on Science and Technology

Most recently, a podcast on the Guardian consisting of an interview of Neil Ferguson. He points very firmly to the data about higher case numbers due to greater importation being what drove the accelerated decision making in mid-March (NB this view is very different from Vallance who very emphatically linked the change in policy advice to the revision of the estimated doubling time - it is simply not possible for both Ferguson and Vallance to both be correct about this). Ferguson mentions this being discussed in the “first weekend in March” which I'm sure must be a simple slip as this would be 7-8th March whereas on the 10th and even 13th SAGE seems pretty sanguine about the situation and does not suggest any need to take immediate action. Assuming he meant the 14-15th March instead, this is far more consistent with SAGE as the minutes of the 16th do certainly suggest some some action should be taken in the light of the new data:

“The science suggests additional social distancing measures should be introduced as soon as possible.”

When asked specifically (at 11m20 in the podcast) “were scientists telling ministers to go earlier?” Ferguson firstly points again to the surveillance data as escalating the decision making process, and then coyly says it was entirely in the Govt's hands as to what actions they took. He could have said, but chose not to, that the Govt followed SAGE's advice promptly and to the letter. And the interviewer didn't pursue the point. While the improved surveillance data undoubtedly played a role in the process, the urgent advice for the most stringent controls only came on the 23rd as a result of the revised estimate of doubling time. You only have to glance at the SAGE minutes to see that they were not shy about offering policy advice throughout the outbreak.

At 16m40 onwards the interviewer says, with reference to the situation in September after schools reopened:

“...once again the advice from scientists was to lock down. But that advice was not heeded. Did that delay once again lead to a higher death rate than we might have seen?”

Without getting into the September story here, any delayed response from the Govt (which I don't dispute was evident in the autumn) could only “once again” have resulted in a higher death rate if there had also been a delayed response to advice to lock down in March. Which there was not, according to the evidence I have outlined.

## Monday, March 01, 2021

### BlueSkiesResearch.org.uk: Escape velocity

There is currently lot of debate on how and when we lift restrictions, and the risks of this. There are several unknowns that may affect the outcome. I have extended the model in a couple of simple ways, firstly by including a vaccination effect which both immunises people, and substantially reduces the fatality rate of those who do get ill, and also by including a loss of immunity over time which is potentially important for longer simulations. The magnitudes of these effects seem highly uncertain, I’ve just made what seems like plausible guesstimates. I use a vaccination rate of 0.5% per day which is probably in the right ballpark though my implementation is extremely simplistic (NB this is the rate at which people move from the vulnerable to the immune category, so it directly accounts for the imperfect performance of the vaccine itself). As well as this, I’m assuming the fatality rate for those infected drops down to 0.3% as vaccination progresses through the most vulnerable groups, since we’ve heard so many good things about vaccination preventing serious illness even in those who do get ill. This value must also account for the proportion of victims that have not been vaccinated at all, so it’s really a bit of a guess but the right answer has to be significantly lower than the original fatality rate. The loss of immunity in this model occurs on a 1 year time scale, which in practice due to model structure means 1/365 = 0.27% of the immune population return to the vulnerable state each day. I don’t claim these numbers are correct, I merely hope that they are not wrong by a factor of more than about 2. In the long term in the absence of illness, the balance between vaccination and loss of immunity loss would lead to about 1/3rd of the population being vulnerable and 2/3rds being immune at any given time. This is just about enough to permanently suppress the disease (assuming R0=3), or at least keep it at a very low level.

The model simulates the historical trajectory rather well and also matches the ONS and REACT data sets, as I’ve shown previously, so I think it’s broadly reasonable. The recent announcements amount to an opening of schools on the 8th March, and then a subsequent reopening of wider society over the following weeks and months. In the simulations I’m about to present, I’m testing the proposition that we can open up society back to a near-normal situation more quickly. So after bumping the R number up on the 8th March I then increase it again more substantially, putting the underlying R0 number up to 2.5 in the ensemble mean, close to (but still lower than) the value it took at the start of last year, with the intention being to simulate a return to near-normal conditions but with the assumption that some people will still tend to be a bit on the cautious side. So this is a much more ambitious plan than the Govt is aiming for. I’m really just having a look to see what the model does under this fairly severe test. Here is the graph of case numbers when I bump the R number up at the end of April:

And here is the equivalent for deaths, which also shows how the R number rises:

So there is another wave of sorts, but not a terrible one compared to what we’ve seen. In many simulations the death toll does not go over 100 per day though it does go on a long time. Sorry for the messy annotations on the plots, I can’t be bothered adjusting the text position as the run length changes.

If we bring the opening forward to the end of March, it’s significantly worse, due to lower vaccination coverage at that point:

Here the daily deaths goes well over 100 for most simulations and can reach 1000 in the worse cases. On the other hand, if we put off the opening up for another couple of months to the end of July, the picture is very much better, both for cases:

and deaths:

While there are still a few ensemble members generating 100 deaths per day, the median is down at 1, implying a substantial probability that the disease is basically suppressed at that point.

I have to emphasise the large number of simplifications and guesstimates in this modelling. It does however suggest that an over-rapid opening is a significant risk and there are likely benefits to hanging on a bit longer than some might like in order that more people can be vaccinated. My results seems broadly in line with the more sophisticated modelling that was in the media a few days ago. To be honest it’s not far from what you would get out of a back-of-the-envelope calculation based on numbers that are thought to be immune vs vulnerable and the R0 number you expect to arise from social mixing, but for better or worse a full model calculation is probably a bit more convincing.

While the Govt plan seems broadly reasonable to me, there are still substantial uncertainties in how things will play out and it is vitally important that the govt should pay attention to the data and be prepared to shift the proposed dates in the light of evidence that accrues over the coming weeks. Unfortunately history suggests this behaviour is unlikely to occur, but we can live in hope.

## Tuesday, January 19, 2021

### BlueSkiesResearch.org.uk: So near and yet not quite…

There’s been quite an amazing turnaround since my last blog post. At the time I wrote that, the Govt was insisting that schools would open as planned (indeed they did open the very next day), and that another lockdown was unthinkable. So my grim simulations were performed on that basis.

Of course, the next evening, we had another u-turn… schools shut immediately and many other restrictions were introduced on social mixing. Even so, most of the experts thought we would be in for a rough time, and I didn’t see any reason to disagree with them. The new variant had been spreading fast and no-one was confident that the restrictions would be enough to suppress it. Vaccination was well behind schedule (who remembers 10 million doses by the end of the year?) and could not catch up exponential growth of the virus.

Just after I posted that blog, someone pointed me to this paper from LSHTM which generated broadly similar results with much more detailed modelling. Their scenarios all predicted about 100k additional deaths in the spring, with the exception of one optimistic case where stiff restrictions starting in mid-December, coupled to very rapid vaccination, could cut this number to 30-40k. Given that we were already in Jan with no lockdown and little vaccination in sight, this seemed out of reach. Here is the table that summarises their projections. Note that their “total deaths” is the total within this time frame, not total for the epidemic.

However, since that point, cases have dropped very sharply indeed. Better than in the most optimistic scenario of LSHTM who anticipated R dropping to a little below 1. Deaths have not peaked quite yet but my modelling predicts this should happen quite soon and then we may see them fall quite rapidly. The future under suppression looks very different to what it did a couple of weeks ago.

So this was the model fit I did back on 3rd Jan, which assumes no lockdown. Left is cases, right is deaths which rises to well over 1k per day for a large part of early 2021.

And here are the cumulative median infections and deaths corresponding to the above, with some grid lines marked on to indicate what was in store up to the end of Feb (for infections) and end of March (for deaths). As you can see, about 100k of the latter in this time frame (ie 186-73 = 113k additional deaths).

Here now are the graphs of the latest model fit showing the extremely rapid drop in cases and predicted drop in deaths assuming a 6 week lockdown:

And here is the resulting median projection for total cumulative infections and deaths as a direct comparison to the previous blog post:

It’s a remarkable turnaround, and looks like we are on track for about 114-86 = 28k additional deaths (to start of April), which is far lower than looked possible a couple of weeks ago. It seems plausible that an large part of the reason for the striking success of the suppression is that the transmission of the new variant was predominantly enhanced in the young and therefore closing schools has had a particularly strong effect. The assumption that lockdown lasts for 6 weeks, and what happens after it, is entirely speculative on my part but I wanted to test how close we were to herd immunity at that point. Clearly there will be more work to be done at that time but it shouldn’t be so devastating as at present, unless we lose all our immunity very rapidly.

So that’s looking much better than it was. However it’s also interesting to think about what might have happened if the Govt had introduced the current restrictions sooner. Moving the start of the lockdown back by three weeks generates the following epidemic trajectory:

and the resulting cumulative infections and deaths look like:

Due to the automatic placing of text it’s not so easy to read but we end up with about 84000 deaths total (to end of March) which is fewer than we’ve already had.

So the additional 30k deaths seems to be the price we paid for Johnson’s determination to battle the experts and save Christmas.

## Thursday, January 07, 2021

### BlueSkiesResearch.org.uk: Not even half-way there

NB: below calculation was made and published on Monday 4th, when schools had just been opened and before the latest lockdown was announced, so we can hope that it is too pessimistic, but the situation is still very challenging.

Lots of talk from politicians and others that, while not exactly triumphalist, is certainly very positive and enthusiastic about the prospect of vaccinating ourselves out of trouble. Here’s one of the earliest "light at the end of the tunnel" articles for example. And with the new Oxford/AZ vaccine there is renewed excitement.

Sorry to pour a little bit of cold water on the mood but a bit of perspective is called for.

Here is what my modelling suggests for the progress of the outbreak though the population so far and into the future. It’s not a pretty sight. The total number that may be infected between now and the start of March (less than 2 months away) is more than the entire number that have been infected so far right from the start of the outbreak last Feb/March.

According to these calculations, roughly 15 million have been infected, and a total of roughly 36 million may be by the time it’s over. That is, we have significantly more infections to come, than we have seen so far. And far more than we got in the first wave last spring, when probably something like 10% (my modelling actually says 8%) of the population was infected.

If there ever was a time to stay at home and minimise all unnecessary contact, it most surely is now.

With this rate of spread, vaccinating a few million over the next couple of months has a relatively minor effect. It may reduce the death rate significantly towards the end of this period (and will certainly help the small minority of highly vulnerable people who receive it), but won’t stop the disease spreading widely.

I need to add a few disclaimers about the modelling. This result plotted above is the median of my latest ensemble fit of a simple SEIR model to historical data on deaths and cases. I’ve been modelling the progress of the outbreak for months now and though the model is rather primitive and approximate it has done a pretty decent job of simulating what is actually quite a simple process. If each infected person passes the disease on to more than one other (on average), then the disease grows exponentially, if they pass it on to less than one (on average) then it shrinks exponentially. The more difficult bits (that my model is too simple to attempt) is to predict the effect of specific restrictions such as closing schools or pubs, or determining how many young vs old people get ill. When just looking at total numbers, this simple SEIR model (when carefully used) works better than it probably should.

This simulation, while it fits the historical data well, may not account adequately for the added virulence of the newer strain that has recently emerged. It also assumes that we don’t have an extremely strict lockdown that successfully suppresses the outbreak in the very near future. Reality could end up better than this, or it could end up worse, but I’m pretty confident that the basic message is robust. People are getting infected at a huge rate right now. Stay at home if you can.

## Tuesday, December 22, 2020

### The new variant

The summary is, yes it's real, yes it's serious, and the early signs are that it could be very difficult to maintain any semblance of suppression in the next month or two. Vulnerable people would be well advised to put off and/or cut down any risky activities at least for now, especially with vaccination just around the corner. It's not necessarily any worse if you catch it, but it is spreading rapidly and will be much more likely to infect you. The one glimmer of good news is that the vaccination seems to be progressing smoothly and at a decent pace, though obviously the sooner we can get more supplies of them, the better.

When the new variant was announced last week, I assumed (along with many others, I think) that its impact was probably exaggerated to cover up the govt's embarrassment over the Christmas U-turn. It was only a couple of days previously that Johnson had derided Starmer for suggesting that the Plaguefest plans should be reconsidered. Starmer had been the Grinch that wanted to Cancel Christmas! And now....it was Johnson doing the cancelling. Such is the effect of the torrent of lies and excuses and u-turns that we've been subjected to over recent months and even years. But....there is a new variant, and it is growing.

It has been traced back to September - one thing the UK actually can legitimately claim to be good at is genetic sequencing a sizeable proportion of samples - and has been getting more and more common since then. Here's an assessment from the ECDC on the 20th Dec. The blue line on this figure shows the proportion of samples containing the new variant, over several weeks (right hand scale).

It's been doubling every week over the second half of this period, a little quicker than that in fact. This isn't itself enough to tell us how fast is is growing in absolute terms, but the total number of positive cases was fairly flat over the relevant portion of this interval, maybe declining a little at the very end (the last week, number 47, is probably around 16-22 Nov). So that means it's close to doubling every week in absolute terms, which would work out at an R number of about 1.7, over a period of lockdown when the underlying R number for general infections was close to 1. Another set of data generates a somewhat lower estimate, but it's still clearly higher for the new variant than the old one:

The proportion of new variant roughly doubled in three weeks, but the total number of new cases was also well up over this period, meaning that the absolute growth rate for the new variant was faster than this.

A more comprehensive analysis was published last night by PHE and this is the key figure:

Each dot is the ratio of weekly growth rates for the new variant versus older type, over several weeks (colour coded) and different regions. The blue line is a fitted mean and it's clearly above 1. The scale is supposed to represent the ratio of R numbers for the two variants and they have made a slightly embarrassing error on this calculation which they may excuse as a reasonable approximation though there's no real reason to have done it as it's not really much harder to have done it correctly. It doesn't really matter so long as no-one considers the numerical result to be some precise truth. The point is that the new variant has a substantially larger growth rate, sufficient for case numbers to have actually been rising in absolute numbers during the latter part of the last "lockdown" in London. They get a typical advantage of about 0.5 in additive terms, or you could call it 1.7 as a multiplicative factor (these data mostly relate to during lockdown when R<1 was a reasonable assumption for the old variant). How the different R numbers relate may depend in part on the background restriction in place, it's not something really amenable to precision.

So the exact answer is uncertain, but it's quite plausible that the R number for this new variant could currently be as high as 1.5 now nationally or even higher, which suggests that it might be very hard to get it below 1 on any sort of sustainable basis. We certainly have a race against time to get as far through the vaccination list as quickly as possible. This calculator suggests that a 70 year old person with no additional health conditions might get vaccinated as early as Feb (though the estimate of 1 million per week is surely a bit of a guess). 80 year olds by late Jan. Even getting that far could cut total deaths in half:

(from ActuaryByDay who is pretty reliable though this figure assumes 100% effectiveness which is obviously a generous estimate, scaling the values by 0.9 to account for this would be an improvement and maybe another 0.8(??) to account for incomplete take-up.)

While we roll out the vaccine, it's surely worth re-emphasising the one obvious lesson of the last 9 months - that it's better to act quickly than to hope in vain that it's all going to turn out fine. If this new variant turns out to be anywhere near as bad as these data suggest, we should be acting with extreme urgency to buy ourselves as much time as possible. If it turns out to not be as bad as I'm suggesting, we can drop the restrictions quickly and the additional harm will be modest. Evidence is coming in quickly but the cost of delay is rising exponentially.

### BlueSkiesResearch.org.uk: Science breakthrough of the year (runner-up)

Being only a small and insignificant organisation, we would like to take this rare opportunity to blow our own trumpets.

Blue Skies Research contributed to one of the runners-up in Science Magazine’s “Breakthrough of the year” review! Specifically, the estimation of climate sensitivity that I previously blogged about here.

Obviously, were it not for the pesky virus, we would have won outright.

## Monday, November 09, 2020

### BlueSkiesResearch.org.uk: Modelling the ONS COVID data

I’ve grumbled for a while about the ONS analyses of their infection survey pilot (pilot? isn’t it a full-blown survey yet?) without doing anything about it. The purpose of this blog is to outline the issue, get me started on fixing it (or at least presenting my own approach to an analysis) and commit me to actually doing it this time. There are a couple of minor obstacles that I’ve been using as an excuse for several weeks now and it’s time I had a go.

The survey itself seems good – they are regularly testing a large "random" cohort of people for their infection status, and thereby estimating the prevalence of the disease and how it varies over time. The problem is in how they are doing this estimation. They are fitting a curve through their data, using a method know as a "thin plate spline." I am not familiar with this approach but it’s essentially a generic smooth curve that attempts to minimise wiggles.

There are two fundamental problems with their analysis, which may be related (or not) but are both important IMO. The first is that this smooth curve isn’t necessarily a credible epidemic curve. Epidemics have a particular dynamical form (you can think of them as locally exponential for the most part, though this is a bit of an oversimplification) and while variation in R over time gives rise to some flexibility in the outcome of this process, there are also inevitably constraints due to the way that infections arise and are detected. In short, the curve they are fitting has no theoretical foundation as the model of an epidemic. In practice it has often appeared to me that the curves have looked a bit unrealistic though of course I don’t claim to have much experience to draw on here!

The second fundamental problem is the empirical observation that their analyses are inconsistent and incoherent. This is illustrated in the following analysis of their Sept 4th results:

Here we see their on the right their model fit (plume) to the last few months of data (the data themselves are not shown). The dot and bar is their estimate for the final week which is one of the main outputs of their analysis. On the left, we see the equivalent latest-week estimates from previous reports which have been produced at roughly weekly intervals The last dot and bar is a duplicate of the one on the right hand panel. It is straightforward to superimpose these graphics thusly:

The issue here is that (for example) their dot and bar estimate several reports back centred on the 3rd August, has no overlap with their new plume. Both of these are supposed to be 95% intervals, meaning they are both claimed to have a 95% chance of including the real value. At least one of them does not.

While it’s entirely to be expected that some 95% intervals will not contain reality, this should be rare, occurring only 5% of the time. Three of the dot and bar estimates in the above graphic are wholly disjoint with the plume, one more has negligible overlap and another two are in substantial disagreement. This is not just bad luck, it’s bad calibration. This phenomenon has continued subsequent to my observation, eg this is the equivalent from the 9th Oct:

The previous dot and bar from late Sept is again disjoint from the new plume, and the one just after the dotted line in mid-August looks to be right on the limit depending on graphical resolution. In the most recent reports have changed the scaling of the graphics to make this overlaying more difficult, but there is no suggestion that they have fixed the underlying methodological issues.

The obvious solution to all this is to fit a model along the lines of my existing approach, using a standard Bayesian paradigm, and I propose to do just this. First, let’s look at the data. The spreadsheet that accompanies the report each week gives various numerical summaries of the data and the one that I think is most usable for my purposes is the weighted fortnightly means in Table 1d which take the raw infection numbers and adjust to represent the population more accurately (presumably, accounting for things like different age distributions in the sample vs the national population). Thanks to the weekly publication of these data, we can actually create a series of overlapping fortnightly means out of two consecutive reports and I’ve plotted such a set here:

It’s not quite the latest data, I’m not going to waste effort updating this throughout the process until I’ve got a working algorithm at which point the latest set can just slot in. The black circles here are the mean estimates, with the black bars representing the 95% intervals from the table.

Now we get to the minor headaches that I’d been procrastinating over for a while. The black bars are not generally symmetric around the central points as they arise from a binomial (type) distribution. My methods (in common with many efficient approaches) require the likelihood P(obs|model) to be Gaussian. The issue here is easy illustrated with a simple example. Let’s say the observations on a given day contain 10 positives in 1000 samples. If the model predicts 5 positives in a sample of 1000, then it’s quite unlikely we would obtain 10: P(O=10|m=5) = 1.8%. However if the model predicts 15 positives, the chance of seeing 10 is rather larger: P(O=10|m=15) = 4.8%. So even though both model predictions are an equal distance from the observation, the latter has a higher likelihood. A Gaussian (of any given width) would assign equal likelihood to both 5 and 15 as the observations are equally far from either of these predictions. I’ve wondered about trying to build in a transformation from binomial to Gaussian but for the first draft I’ll just use a Gaussian approximation which is shown in the plot as the symmetric red error bars. You can see a couple of them actually coincide with the black bars, presumably due to rounding on the data as presented in the table. The ones that don’t, are all biased slightly low relative to the consistent positive skew of the binomial. The skew in these data is rather small compared to that of my simple example but using the Gaussian approximation will result in all of my estimates being just a fraction low compared to the correct answer.

Another issue is that the underlying sample data contribute to two consecutive fortnightly means in these summaries. A simple heuristic to account for this double-counting is to increase uncertainties by a factor sqrt(2) as shown by the blue bars. This isn’t formally correct and I may eventually use the appropriate covariance matrix for observational uncertainties instead, but it’s easier to set up and debug this way and I bet it won’t make a detectable difference to the answer as the model will impose strong serial correlation on this time scale anyway.

So that’s how I’m going to approach the problem. Someone was previously asking for an introduction to how this Bayesian estimation process works anyway. The basic idea is that we have a prior distribution of parameters/inputs P(Î¦) from which we can draw an ensemble of samples. In this case, our main uncertain input is a time series of R which I’m treating as Brownian motion with a random daily perturbation. For each sample of Î¦, we can run the model simulation and work out how likely we would be to observe the values that have been seen, if the real world had been the model – ie P(Obs|Î¦). Using these likelihood values as weights, the weighted ensemble is directly interpretable as the posterior P(Î¦|Obs). That really is all there is to it. The difficulties are mostly in designing a computationally efficient algorithm as the approach I have described may need a vast ensemble to work accurately and is therefore sometimes far too slow and expensive to apply to interesting problems. For example, my iterative Kalman smoother doesn’t actually use this algorithm at all, but instead uses a far more efficient way of getting to the same answer. One limitation that it requires (as mentioned above) is that the likelihood has to be expressed in Gaussian form.

## Tuesday, September 15, 2020

### SAGE versus reality

Something I've been meaning to do for a while is look at how well the SAGE estimates of the growth rate of the epidemic have matched up to reality over the long term. For the last 3 months now, SAGE have published a weekly estimate not only of R but also the daily growth rate, which is actually a more directly interpretable number (as well as being provided to a higher degree of precision). What I have done is taken their estimate of daily growth rate and integrated it over time. And plotted this against the number of cases actually reported.

Here we are:

The solid blue line is the central estimate from SAGE, with the dashed lines calculated using the ends of the range they published each week. Red is the weekly mean number of cases over this time period, with this line scaled to start at the same place in week 1 (ending on Friday 19 June). Latest SAGE estimate in this plot is from Friday 11 Sept.

Agreement was very good for the first few weeks, with case numbers going down at the rate described by SAGE of about 3% per day. But then the case numbers started to drift up in July...and SAGE continued to say the epidemic was getting smaller. Over the last few weeks the discrepancy has grown sharply. Note that the dashed lines assume the extreme edge of the range presented by SAGE, week after week - so this would require a consistent bias in their methodology, rather than just a bit of random uncertainty.

Honesty compels me to point out that the comparison here is not completely fair, as the number of cases may not be a consistent estimate of the size of the outbreak. Some of the rise in cases may be due to increased testing. However the discrepancy between case numbers and the mean SAGE estimate is now a factor of 10 compared to the starting point of this analysis. That's not due to better testing alone!

## Saturday, September 12, 2020

### Weekly RRRRRRReport

A lot of different estimates of the growth rate (R) of the epidemic have come out in the last couple of days, so here's a summary of which ones are wrong (and why) and which ones you can believe. And who am I to do this, you might reasonably ask? While not an epidemiologist, my professional expertise is in fitting models to data, which is precisely what this question demands. And the available evidence suggests I'm rather better at it than many epidemiologists appear to be.

As you may recall, a month ago I posted an argument that R really couldn't be under 1 any longer, and the epidemic was starting to grow again. At the time, the "experts" of SAGE were still insisting that R was less than 1, and they kept on claiming that for a while, despite the very clear rise in reported case numbers. The rise has continued and indeed accelerated a bit, other than for a very brief hiatus in the middle of last month. Part of this steady rise might have be due to a bit more testing, but it's been pretty implausible to believe that all of it was for a while now. I'll come back to SAGE's ongoing incompetence later.

I'll start with my own estimate which currently comes out at R= ~1.3. This is based on fitting a very simple model to both case and death data, which the model struggles to reconcile due to its simplicity. The average death rate (as a percentage of infected people) has dropped in recent weeks, thanks to mostly younger people being infected recently, and perhaps also helped by some improvements in treatment. I could try to account for this in the model but haven't got round to it. So it consistently undershoots the case numbers and overshoots deaths a bit, but I don't think this biases the estimate of R enough to really matter (precisely because the biases are fairly constant). Incidentally, the method I'm using for the estimation is an iterative version of an ensemble Kalman smoother, which is a technique I developed about 15 years ago for a different purpose. It's rather effective for this problem and clearly superior to anything that the epidemiologists are aware of. Ho hum.

Here are my plots of the fit to cases (top) and deaths (bottom) along with the R number.

As pointed out, these graphs need better annotation. Top graph is modelled daily infections (blue plume), modelled daily cases (green plume with some blue lines sampled from the ensemble and the median shown as magenta) and case ascertainment ratio which is basically the ratio of these (red plume, RH scale). Reported case numbers are the red circles. Bottom graph is modelled deaths (green plume with lines again) with data as red circles. Red plume here is the effective R number (RH scale). R number and case ascertainment are the fundamental parameters that are being fitted in my approach. Infection fatality rate is fixed at 0.75%.

So far, so good. Well, bad, but hopefully you know what I mean.

Another relevant weekly analysis that came out recently is the infection pilot survey from ONS. Up to now it's been pretty flat and inconclusive, with estimates that have wobbled about a little but with no clear signal. This all changed with their latest result, in which the previous estimate of 27,100 cases (uncertainty range 19,300 - 36,700) in the week of 19 - 25 Aug increasing to 39,700 (29,300 - 52,700) in the week 30 Aug - 5 Sept. That is a rise of 46% in 11 days or about 3.5% per day. R is roughly the 5-day growth rate (for this disease), so that corresponds to an R value of 1.2, but note that their analysis doesn't extend over the past week when the cases have increased more sharply.

Actually, I don't really think the ONS modelling is particularly good - it's a rather arbitrary curve-fitting exercise - but when the data are clear enough it doesn't matter too much. Just looking at the raw data that they kindly make available, they had almost 1 positive test result per 1000 participants over the fortnight 23 Aug - 5 Sept (55 cases in 59k people) which was 65% up on the rate for the previous fortnight of 26 cases in 46k people. Again, that works out at R=1.2.

A rather worse perspective was provided by SAGE, who continue to baffle me with their inability to apply a bit of common sense and update their methods when they repeatedly give results so strikingly at odds with reality. They have finally noted the growth in the epidemic and managed to come up with an estimate marginally greater than 1, but only to the level of R=1.1 with a range of 1-1.2. And even this is a rounding-up of their estimate of daily growth rate of 1 ± 2% per day (which equates more closely to R=1.05 with range of 0.95-1.15). Yes, they really did say that the epidemic might be shrinking by 1% per day, even as cases are soaring and hospital admissions are rising. I do understand how they've managed to generate this answer - some of the estimates that feed into their calculation only use death data, and this is still very flat - but it's such obvious nonsense that they really ought to have pulled their heads out of their arses by now. I sometimes think my work is a bit artificial and removed from practical issues but their unwillingness to bend to reality gives ivory tower academics a bad name.

At the other extreme, a paper claiming R=1.7 was puffed in the press yesterday. It's a large survey from Imperial College, that bastion of incompetent modelling from the start of the epidemic. The 1.7 number comes from the bottom right hand panel in the below plot where they have fitted an exponential through this short subset of the full time series of data. There is of course a lot of uncertainty there. More importantly, it doesn't line up at all with the exponential fitted through the immediately preceding data set, starting at a lower level than the previous curve finishes. While R might not have been constant over this entire time frame, the epidemic has certainly progressed in a continuous manner, which would imply the gap is filled by something like the purple line I've added by hand.

It's obviously stupid to pretend that R was higher than 1 in both of the recent intervals where they made observations, and just happened to briefly drop below 1 exactly in the week where they didn't observe. The sad thing about the way they presented this work to the media is that they've actually done a rather more sensible analysis where they fit the 3rd and 4th intervals simultaneously, which is shown as the green results in the 3rd and 4th panels on the top row of the plots (the green in the 3rd panel is largely overlain by blue which is the fit to 2nd and 3rd intervals, but you can see if you click for a larger view). Which gives.....R=1.3. Who'd have thought it?

Of course R=1.7 is much more headline-grabbing. And it is possible that R has increased towards the end of their experimental period. Rather than fitting simple exponentials (ie fixed R values) to intervals of data, perhaps a more intelligent thing to do would have been to fit an epidemiological model where R is allowed to vary through time. Like I have been doing, for example. I'm available to help and my consultancy rates are very reasonable.

In conclusion, R=1.3(ish) but this is a significant rise on the value it took previously and it might well be heading higher.