OK, it’s answer time for these questions (also here on this blog). First, a little background. This is the paper, or rather, here it is to download. The questions were asked of over 100 psychology researchers and 400 students and virtually none of them got all the answers right, with more wrong than right answers overall.
The questions were modelled on a paper by Gigerenzer who had done a similar investigation into the misinterpretation of p-values arising in null hypothesis significance testing. Confidence intervals are often recommended as an improvement over p-values, but as this research shows, they are just as prone to misinterpretation.
Some of my commenters argued that one or two of the questions were a a bit unclear or otherwise unsatisfactory, but the instructions were quite clear and the point was not whether one might think the statement probably right, but whether it could be deduced as correct from the stated experimental result. I do have my own doubts about statement 5, as I suspect that some scientists would assert that “We can be 95% confident” is exactly synonymous with “I have a 95% confidence interval”. That’s a confidence trick, of course, but that’s what confidence intervals are anyway. No untrained member of the public could ever guess what a confidence interval is.
Anyway, the answer, for those who have not yet guessed, is that all of the statements were false, broadly speaking because they were making probabilistic statements about the parameter of interest, which simply cannot be deduced from a frequentist confidence interval. Under repetition of an experiment, 95% of confidence intervals will contain the parameter of interest (assuming they are correctly constructed and all auxiliary hypotheses are true) but that doesn’t mean that, ONCE YOU HAVE CREATED A SPECIFIC INTERVAL, the parameter has a 95% probability of lying in that specific range.
In reading around the topic, I found one paper which had an example which is similar to my own favourite. We can generate valid confidence intervals for an unknown parameter with the following procedure: with probability 0.95, say “the whole number line”, otherwise say “the empty set”. If you repeat this many times, the long-run coverage frequency tends to 0.95, as 95% of the intervals do include the true parameter value. However, for a given example, we can state with absolute certainty whether the parameter is either in or outside the interval, so we will never be able to say, once we have generated an interval, that there is 95% probability that the parameter lies inside that interval.
(Someone is now going to raise the issue of Schrödinger’s interval, where the interval is calculated automatically, and sealed in a box. Yes, in this situation we can place 95% probability on that specific interval containing the parameter, but it’s not the situation we usually have where someone has published a confidence interval, and it’s not the situation in the quiz).
And how about my readers? These questions were asked on both blogs (here and here) and also on twitter, gleaning a handful of replies in all places. Votes here and on twitter were majority wrong (and no-one got them all right), interestingly all three of the commenters on the Empty Blog were basically correct though two of them gave slightly ambiguous replies but I think their intent was right. Maybe helps that I’ve been going on about this for years there
4 comments:
>"We can generate valid confidence intervals for an unknown parameter with the following procedure: with probability 0.95, say “the whole number line”, otherwise say “the empty set”. If you repeat this many times, the long-run coverage frequency tends to 0.95, as 95% of the intervals do include the true parameter value."
Seems an odd confidence interval and I am not sure what we learn from this other than CI can be weird.
I found it helpful to consider that the true distribution might be:
1 in 1,000,000 chance of -250,000
rest 0.25 with small amount of noise.
If researcher's sample size is less than 100,000 then likely to generate 0.1 to 0.4 confidence interval. But the true mean is -0.00000025.
Clearly there are an infinite number of examples like the above or more normal that can produce a 0.1 to 0.4 confidence interval.
Clearly there was no information on whether the particular distribution in question was likely or highly likely to be a normal or highly skewed example. (Also no info on researchers sample size.)
So clearly the statements are false as in don't follow.
Was expecting more to it and probably tried too hard to find other issues as well.
Yes, the fact that "CI can be weird" means that you can't deduce anything meaningful from the CI. Where CI means confidence interval, a credible interval on the other hand is extremely useful as it describes where you believe the parameter to lie.
I think however your counterexample may not be valid, as the CI calculation is (probably) making a distributional assumption that is false (ie using the mean and sd of a sample is only valid if the sample is gaussian). Whereas CIs are not useful even if there are no errors in the auxiliary hypotheses involved in their creation. In your case, the CIs do not have correct coverage under repeated experimentation.
Well you are the expert and if you say my example is not valid, you are almost certain to be correct. Yes I accept that there is at least an error in the auxiliary hypotheses and that may well make it a bad example to use. However, does this really matter if no-one knows about the 1 in 1,000,000 chance?
Re "the CIs do not have correct coverage under repeated experimentation".
My 1 in 1000000 chance is well outside a 5%-95% range. If the experiment is take the mean of 20 numbers drawn from this distribution, then 1 in a 1,000,000 chance coming up is a rare case well outside the 5-95 range. Does it matter to the CI whether whether the 1 in a 1,000,000 chance has value of -250,000 or +100,000? I don't see that the CI is changed much if at all whether you repeat the experiment 100 times, 10,000 times or 10,000,000 times.
If the experiment is take the mean of 10,000,000 numbers drawn from this distribution and this is repeated 10,000,000 times then the CI will be centred near 0 and and I accept that here the 0.1-0.4 range does not have correct coverage under repeated experimentation. In that case fair enough, but this wasn't what I was intending.
Maybe this quibble is my fault that I didn't adequately explain my example was finding the mean of just 20 numbers drawn from this distribution. (Or more likely I am still wrong maybe in more different ways.)
Yes the distribution is still not gaussian, there is a small very low probability peak near -249,995, but this seems such a small divergence from the gaussian assumption that I am inclined to try to dismiss it as a rather trivial error in the auxiliary hypotheses. Maybe I shouldn't get away with that?
D'oh that last post is full of errors. I forgot to divide -249995 by 20 and as it is a 95 CI I should have written 2.5-97.5% ranges not 5-95. Hope these errors don't distract too much.
Post a Comment