An interesting example I spotted on Andrew Gelman's blog.

UK readers will remember a medical test where 6 people took a particular drug and all had an extreme life-threatening reaction ("cytokine storm", whatever that means). Apparently there were also 2 controls, who were not treated, and who (surprise) did not suffer the reaction.

But...with only 8 samples in total, the results are barely significant in frequentist terms. Perhaps the simplest way of analysing the result is to ask the following: since 6 people out of 8 fell ill, and given the null hypothesis that the treatment and control outcomes are probabilistically identical, what is the probability that the 6 ill people would coincide with the 6 treated people? This is a simple combinatorial question, the answer to which is 1/28 or 3.6% (there is some more detailed discussion at the link about the correct test to use). So it is just significant at the p<0.05 threshold but not p<0.01. Given the number of medical trials taking place, we should expect such failures regularly.

But we don't, of course. The reason being, our prior expectation of someone naturally having such a life-threatening reaction, absent any provocation, is so low as to be virtually zero. Any plausible Bayesian updating of the prior belief P(treatment is harmful) in the light of the observed data, is going to massively increase this probability, because the alternative hypothesis (that the reactions occurred by chance) is even lower. And this is obviously what all the researchers and commentators have actually done in practice, even if not explicitly and precisely.

Eg let's model it as the test having two possibilities: either it is harmful (all subjects will suffer) or not (reaction has the background probability 0.0001, surely an overestimate). Given an extremely complacent prior belief that the test is harmless with probability 0.999, the posterior after 6 test subjects have all reacted is given by:

P(test is harmful)=1*0.001/(1*0.001+0.0001^6*0.999) = 1, to as many significant digits as I can be bothered writing. That's a very trivial analysis of course, but real maths is hard to do in Blogger (no LaTeX facility).

UK readers will remember a medical test where 6 people took a particular drug and all had an extreme life-threatening reaction ("cytokine storm", whatever that means). Apparently there were also 2 controls, who were not treated, and who (surprise) did not suffer the reaction.

But...with only 8 samples in total, the results are barely significant in frequentist terms. Perhaps the simplest way of analysing the result is to ask the following: since 6 people out of 8 fell ill, and given the null hypothesis that the treatment and control outcomes are probabilistically identical, what is the probability that the 6 ill people would coincide with the 6 treated people? This is a simple combinatorial question, the answer to which is 1/28 or 3.6% (there is some more detailed discussion at the link about the correct test to use). So it is just significant at the p<0.05 threshold but not p<0.01. Given the number of medical trials taking place, we should expect such failures regularly.

But we don't, of course. The reason being, our prior expectation of someone naturally having such a life-threatening reaction, absent any provocation, is so low as to be virtually zero. Any plausible Bayesian updating of the prior belief P(treatment is harmful) in the light of the observed data, is going to massively increase this probability, because the alternative hypothesis (that the reactions occurred by chance) is even lower. And this is obviously what all the researchers and commentators have actually done in practice, even if not explicitly and precisely.

Eg let's model it as the test having two possibilities: either it is harmful (all subjects will suffer) or not (reaction has the background probability 0.0001, surely an overestimate). Given an extremely complacent prior belief that the test is harmless with probability 0.999, the posterior after 6 test subjects have all reacted is given by:

P(test is harmful)=1*0.001/(1*0.001+0.0001^6*0.999) = 1, to as many significant digits as I can be bothered writing. That's a very trivial analysis of course, but real maths is hard to do in Blogger (no LaTeX facility).

## 9 comments:

If the odds of each person getting sick are 0.0001, or 1/10^4, then the odds of all 6 getting sick independently are 1/10^24 - reasonably defined as "impossible" in the real world.

The two controls reduce the possibility that the subjects were sickened by something else in their laboratory environment, such as a gas leak, or ebola.

The odds that the 6 would get sick and the 2 would not, given a problem in the laboratory environment, would seem to be 3.6%, as you write.

But until you quantify the odds of that sort of environmental problem you can't get a final answer of the total odds of this happening. The odds of such a laboratory problem would appear to be very small, but not nearly as small as 1/10^24.

Actually, the odds would seem to be 3.6%

of the odds that a problem would hit exactly 6 of the subjects. Which is even lower odds than I suggested above. But still well above 1/10^24.Of course, if there's a gas leak or ebola infection, we'd expect the researchers to be subject to the risk as well, depending on how and when the facility is occupied (maybe it takes 24 hours for the gas to make one ill, and the researchers have 8-hour shifts).

Of course my example calc was rather trivial - I was just trying to give a flavour of things. Eg, the fact that there is a clear link between the action of the drug and the nature of the illness is also highly relevant.

But James,

Aren't you misusing statistics in the same way as Professor Meadows? See:

http://www.timesonline.co.uk/tol/news/uk/article536728.ece

No, if I had scoured the world looking for an ill person and then blamed the illness on some arbitrary factor (or their inherent evilness) I would have been making his error (at least, one of his errors). But there were only 6 people who took the drug in the first place.

Your analysis is not Bayesian. The extra information about the probability of occurrence of a death-threatening situation under non-drug conditions is part of the likelihood model. It does not imply a prior distribution about the effect of the drug.

Well, I never claimed I had done more than a rather trivial and naive exploration of the way the data could be analysed, but I'm not sure I understand your point. I did write:

Given an extremely complacent prior belief that the test is harmless with probability 0.999which seems to me to be a pretty clear statement of (strong) prior belief that the drug is harmless.

My point was not to criticize your analysis - it is indeed simple, but informative and correct. I was criticizing the characterization of this analysis as being Bayesian, and supposedly demonstrating "the importance of Bayesian analysis".

The analysis you made regarding the chance to get become mortally ill is a standard likelihood analysis, which would enable you to find a very significant p-value, and reject the null hypothesis (i.e., that the drug is not dangerous) in a rigorous frequentist framework.

Like any such likelihood analysis, you can then add prior information and make Bayesian statements about posterior probabilities.

Since the substantive argument can be made without the prior information (and is in fact more convincing that way, since it does not involve arbitrary prior models), this case does not serve as a demonstration of the importance of Bayesian analysis. It does demonstrate the importance of using appropriate likelihood models.

Ah, yes I see now. Thanks for the clarification.

Post a Comment