Showing posts sorted by relevance for query prosecutor's fallacy. Sort by date Show all posts
Showing posts sorted by relevance for query prosecutor's fallacy. Sort by date Show all posts

Sunday, July 16, 2006

More on detection, attribution and estimation 3: The Prosecutor's Fallacy

The error of equating P(Data|Hypothesis) and P(Hypothesis|Data) is known as the Prosecutor's Fallacy, due to its frequent appearance in criminal trials (misinterpretation of DNA evidence etc) (the dispute on that wikipedia page seems to refer to the details of its applicability to a particular legal case, not the underlying theory). Typically, it is illustrated via a simple discrete yes/no question along the following lines: if the probability of a random person matching a DNA sample from a crime scene is 1 in 1,000,000, then what is the probability that a suspect is guilty, given only that their DNA matches? The fallacial answer is 999,999 in 1,000,000. An easy way to see the flaw in this is to note that in the UK, there are 60,000,000 people so there will be 60 people whose DNA matches, only 1 of whom will be the guilty one (note various other assumptions I've made, including the fact that a crime actually took place at all, it was committed by one person, and that there is no other evidence as to the suspect).

Formally, the exact calculation of P(H|D) when we are given P(D|H) requires Bayes' Theorem:
P(H|D)=P(D|H)P(H)/P(D)
which requires the specification of a "prior" P(H) (P(D) is a normalisation constant which provides no real theoretical difficulties, although it might be hard to calculate in practice). It must be understood that this equation does not depend on "being a Bayesian" or "being a frequentist". It is simply a law of probability, which follows directly from the axioms (in particular, P(D,H)=P(D|H)P(H)=P(H|D)P(D)). So it's not something we can choose to obey or not - at least, without abandoning any pretence that we are talking about probability as it is usually understood.

Although the prosecutor's fallacy is generally demonstrated through discrete probability, Bayes' Theoreom applies equally to continuous probability distribution functions, with f(h|d) being related to f(d|h) via f(h|d)=f(d|h)f(h)/f(d). This explains the distinction between confidence intervals and credible intervals demonstrated in the last post, since an experimental observation gives us f(d|h) (a likelihood function), and in order to turn it into a posterior pdf f(h|d) we need to use a prior f(h).

For example, given the previous apple-weighing example, we might have a prior belief that the apple will weight about 100g, plus or minus 20g at 1 standard deviation (and strictly speaking, the prior should be truncated at 0). The likelihod function arising from the measurement is itself a Gaussian shape centred on the observed 40g, with a width of 50g - this function does extend to negative values, as a hypothetical negative-mass apple would have a nonzero probability of a returning a 40g measurement. Applying Bayes' Theorem formally gives us the well-known result of optimal interpolation between two gaussians, which in this case works out to 91.7+-18.6g. In this case, the observation is so poor that it hardly affects our prior belief, but if our scales had an error of only 5g we'd obviously depend far more on their output. In no case would we end up believing that the apple's mass was negative!

Next, and perhaps last (for now at least): what the literature says.

Sunday, February 19, 2006

73,000,000:1

This is the famously erroneous statistic quoted by Roy Meadow, supposedly giving the odds of two children dying of SIDS (cot death) in the same family, when appearing as an "expert witness" in a murder trial. His "evidence" neatly illustrates two distinct fallacies in the same calculation, and as well as resulting in some wrongful convictions which were eventually overturned, he was struck off by the General Medical Council, but has just been reinstated following an appeal.

Firstly the fallacies: he got his number by squaring the odds of a single child dying in this way, of about 8,500:1. The implicit assumption is that such deaths strike entirely at random. However, if there are any genetic or environmental factors involved, a the probability second death in the same family would be far greater. Ignoring this factor is known as the ecological fallacy. In fact medical research suggests a second death happens in about 1 out of 100 cases where a first death has happened.

A second and more pernicious fallacy is the prosecutor's fallacy, so called because it seems pretty near ubiquitous in the presentation of statistical evidence (eg DNA tests). It is the incorrect interpretation of a probability of an event happening to an innocent person, as implying the probability that this person is in fact guilty.

An example: assume that you are a juror, and you've been told that DNA testing provides a match to the suspect at the 1,000,000:1 level (ie, a random person will match with probability 1 in 1,000,000). What is the probability that this person committed the crime? 999,999/1,000,000? No! Given a population of 60,000,000 in the UK, there will be about 60 people whose DNA matches that well. A priori, any match only indicates a 1 in 60 chance that a particular person did it - if there's no other evidence, this is all but worthless. However, it would be a rare juror who would understand this (and probably a rare lawyer). In practice, matches are often quoted at a much higher level of significance, but by the time we are up to billions to one, the chances of accidental contamination or fraud must be higher than that anyway.

Back to Meadow. He was reinstated recently, with the judge ruling that expert opinion given in court should be "priviledged" in the sense that incompetence is not grounds for the GMC to punish the witness. While that may seem rather bizzarre, I note that no-one has called for the defence lawyers to be themselves disbarred, for failing to produce any expert witness who understood elementary statistics well enough to destroy (that aspect of) Meadow's evidence. That is surely incompetence on a similar level to Meadow's.

Friday, September 01, 2006

What is probability?

I happened to come across a somewhat off-hand question "what exactly does it mean to assign probabilities for a single event?" during some random blog-surfing a few days ago. I thought it was widely accepted that such probabilities are essentially Bayesian, that is, subjective expressions of the degree of belief of a person in the proposition in question (eg as Stefan Rahmstorf writes). There are, to be sure, practical difficulties in accessing this belief in a precise and consistent manner (especially if people are prepared to lie), and personal probabilities may change from minute to minute and day to day, but the basic theory seems clear enough and forms the foundation of a large field of research with many practically useful outputs. One thing that is certainly clear (and I believe undisputed) is that the main competing interpretation (frequentism) cannot apply at all in such situations. So if you want to talk in probabilistic terms at all, you've simply got to go outside that framework, and the standard Bayesian angle seems the obvious one.

Anyway, today I finally got the Reply from Allen and Frame to our attempted Comment. [This had been accidentally omitted from the set of reviews that were sent a couple of weeks ago.] I don't intend to publish and fisk it in detail - that would be tedious, lengthy and no-one would care. However, since it was offered for publication, they can hardly complain about me making a couple of comments on it.

One striking sentence in particular jumped out at me:
"We do not think most scientists interpret probabilistic forecasts purely as expressions of degrees of belief."
(And just to clarify, the context makes it clear that this is not indended as a snide comment about the ignorance of "most scientists", but rather as support for A&F taking this same position.)

While they are being admirably clear and frank in acknowledging that they do not actually believe the estimates that they have published, it does rather raise the issue of what they consider the status of their probabilistic estimates to be.

Although I do favour what I understand to be the standard subjective Bayesian viewpoint for non-frequentist probability, I'm not dogmatically going to insist that it is the only possible one - philosophers and mathematicians have argued for centuries over probability, and I don't pretend to have all the answers or to have covered all the bases. Note, however, that Wikipedia only mentions 2 broad categories, Bayesian and Frequentist - any others seem to be rather esoteric philosophical finesses of these two, not major revolutions (excluding imprecise probability which is a whole new can of worms wholly irrelevant to this discussion). Salmon (1966) proposes three criteria for a proposed interpretion of probability:
  1. Admissibility or coherence (must satisfy the Kolmogorov axioms).
  2. Ascertainable (there's a method for calculating it)
  3. Applicable (useful in real life applications)
Obviously, whatever A&F's interpretation is, it fails on admissibility - a point which they have also explicitly acknowledged (indeed they claim it as a feature rather than a bug). Failure on point 2 is therefore a gimme - through being multi-valued (see my previous example on P(x>4) ≠ P(x4>34)) their methods also fail ascertainability, since any answer can be generated by reformulating the question in logically equivalent ways (hmm...I can see a semantic dodge here - does the answer "whatever you want it to be" count as a method for calculating their probability? I'll leave them to decide on that). All that needs to be shown is that their results are not useful and they'll have a 0/3 score :-) Of course this just all means that they think Salmon is wrong too, I guess...but more importantly, it leaves unanswered the question of what their version of probability actually is. What axioms does it satisfy (if any)? What does it mean?

According to their Reply (and indeed the referee who supported them), all this is entirely clear to all climate scientists (except us, I guess) and needs no further clarification. I'd be interested to hear from anyone, climate scientist or not, who can make head or tail of it!

There's a further funny point which I can't resist mentioning. Their Reply makes much of the fact that the D&A stuff (of which Myles Allen is a major contributor) routinely commits the Prosecutor's Fallacy in turning the (frequentist) confidence intervals that classical D&A methods produce, into the (Bayesian) probability intervals that people really want to see. But rather than being embarassed by this, they use it to justify their claim that a uniform prior is in fact the appropriate choice! It really is Emperor's New Clothes stuff.

Tuesday, March 03, 2015

Climate change by numbers

...is the title of an interesting TV programme that was on BBC 4 last night. It is quite amazing that they dared to show such a maths/stats/science-heavy program at prime time, albeit on a minor channel, so I will start by commending them for that (the inevitable grumbles follow later). The three numbers they featured were the 0.85C warming since 1880s, 95% confidence that anthropogenic influence had caused most of this, and the 1 trillion tonnes of carbon that would take us to about 2C warming. I think it was originally planned to be three 30 minute programmes, but they ran it all together as one long piece, which seemed to work well to me.

I think they told the stories in an engaging manner, there was also lots of interesting historical stuff about how our understanding of the climate system has developed, which was mostly very well done and would probably have been even more interesting had I not already known it! But of course I was hardly the target audience.

In fact one of the researchers making the program contacted me last year to talk about Bayesian vs frequentist approaches to detection and attribution, specifically the IPCC's statement attributing most of the warming of the last century to anthropogenic effects. Unfortunately I wasn't able to be very encouraging about the idea of explaining the differences between Bayesian vs frequentist approaches to the general public, after all most climate scientists struggle with this question as is demonstrated by the IPCC's misrepresentation of D&A results! I've written on this (really must update my web pages, that link won't last for ever...or will it?) but the argument has little traction even in the climate science community because most people are quite content to continue in their comfortably-erroneous way.

Anyway, the Bayesian thing didn't make it into the transmitted programme, which I was neither surprised nor disappointed about, as I really can't see how to present it in such a way that the general public would get anything out of it. And the traditional misrepresentation of the probability of observations more extreme than observed given the null, as the probability of the null given the observations, was heavily featured (that's basically where the 95% comes from). Sigh. But what I really want to grumble about most strongly was the garbled and nonsensical representation of Kalman filtering in the first section, which, contrary to the claims in the programme, is not a method to check observations against each other and has not been used for temperature data homogenisation. The Kalman filter is actually used for updating a model prediction with new observations, and this is how it was used for space navigation. That is, based on current estimates of velocity and position at time t1, the equations of motion are used to predict the new position and velocity at subsequent time t2, and then imperfect observations of the position at t2 are used to update the estimates of position and velocity, and so on ad infinitum.

Ok, pedants may observe that NCEP has pioneered the use of an ensemble Kalman filter for its 20th century reanalysis project, but this is somewhat tangential to climate change and their results, interesting as they are, have their own homogenisation problems and are are hardly central to the debate on global warming. Ironically, Doug McNeall (who was involved as a scientific consultant, I'm not blaming him for anything in particular though) tweeted a link to the wikipedia page on Kalman filtering, which is a much better resource for anyone interested in learning more about the topic. Anyway, I'm really baffled as to where this bit came from - maybe they just couldn't resist a link to “rocket science” :-) Or did someone think “filter” might be related to filtering out bad data? Well, it isn't.

The “pixel sticks” were very clever, but I don't really think a line graph is improved by drawing it on wobbly axes, expecially if a straight line trend is then drawn through the data! I wonder if Doug will feature that on his Better Figures blog :-) And as for the presenters spending most of their time walking away from the camera...I'm probably sounding like a grumpy old man so I'd better stop. As I said, I think it was pretty good overall, but if you want a mathematical/statistical program that really doesn't make any concessions to dumbing down, and that does cover climate change (and Bayesian statistics) on occasion, I strongly recommend “More or Less” on Radio 4.

Update: Oh, this is interesting. It's a blog post about the programme from the mathematician (Norman Fenton) who presented the 95% section. Turns out he is actually a Bayesian who clearly understands how that number is tarnished by prosecutor's fallacy, and he argues that the scientific debate would be improved by a greater use of Bayesian methods!