Sunday, June 04, 2006

Is Peer Review a Game of Chance?

No, this isn't going to be a boring whinge by me about the bad reviews I're received and how life isn't fair (in fact, I've had very few reviews that I have any major complaints about). The title of this post is actually the title of an interesting but flawed paper by Neff and Olden which I recently had my attention drawn to. (The full text can be found here). The authors investigate the influence of randomness in reviewer decisions, based on a very simple (but IMO fairly reasonable) model. They define the "suitability" of a paper for a journal to be the proportion of researchers who would agree that the paper should be published in that particular journal. This has the nice property that it directly accounts for a journal's perceived quality, and furthermore can be directly applied to the reviewers too (with the slight modification of the judgement to "should be published after necessary revision"). They further define the "correct" threshold for publication to be an 80% level (ie, 80% of readers/reviewers agree it should be published), and investigate the effect of changes to the review procedure (eg using more or fewer referees, possibly with an editorial pre-screening) in terms of the quality of the resulting decisions. So far so good.

It's certainly an interesting topic, and they argue that some real journal data lends support to their model. That seems perhaps a bit tenuous (it's hard to rule much in or out IMO), but still reasonably plausible to me. Where they really fluff their lines is in the calculations they present from their model. They claim to calculate the "probability of wrongful rejection" and "probability of wrongful acceptance" and present figures for these in Table 1, Fig 2 and the text. Of particular interest, they claim that the former value (wrongful rejection) remains extremely low, only increasing from 6% to 7% and then 8% when the review process is tightened considerably by moving from 2 reviewers first to 3 and then 4 respectively (each reviewer is assumed to be armed with a veto, rather than a majority vote being used). The probability of wrongful publication, on the other hand, drops from a rather worring 51% to a clearly better (but still perhaps rather high) 33% at the same time. Adding a layer of editorial review (effectively another referee, although with some differences) improves things still further. On the basis of this, they argue strongly that stiffening up the system would improve the quality of journals considerably.

It seemed immediately obvious to me that their numbers didn't pass the sniff test, so I investigated a little more carefully. What they have actually calculated is not the probability that a good manuscript is rejected (or vice versa), but instead the proportion of rejected (accepted) papers which should have been accepted (rejected). In order to do this, it is necessary to estimate (or assume) what proportion of initially submitted manuscripts are poor. The authors set this at a whopping 80%, which means that when they look at the heap of rejected manuscripts, the proportion of these which should have been published, could not possibly exceed 20% unless they reviewers actively prefer poor papers over good! So the probabilities that they present as "wrongful rejection", of around 6-10%, seem low but do not actually address the question that many researchers will want to know the answer to (and will surely expect "wrongful rejection" to refer to), which is: if I, or anyone else, write a manuscript which qualifies as "good" according to the defined criterion, what is the probability that it gets rejected?

This latter calculation is pretty trivial, and does not depend on any assumptions about how many good and bad manuscripts there are in total. Even under the weakest review system they consider, where acceptance depends on a mere 2 reviewers agreeing with publication, the real probability of a paper of 90% quality (clearly well above the defined quality threshold for publication) being rejected is actually as high as 19%, way above the 6% figure they present. With 4 reviewers, the probability of rejection for the same paper increases to a whopping 34%, and a more borderline paper of 84% quality is actually more likely to be rejected than not. Would scientists be willing to accept a system where high quality research had such a high probability of rejection? I doubt it, but I guess that is arguable, depending on how one views the relative importance of the two problems of wrongful acceptance versus wrongful rejection. More importantly, the case for more stringent review - if there is one - needs to be made on the basis of a valid representation of its likely effects rather than their rather misleading calculations.

I can't resist the small snark that perhaps the publication of this paper proves the authors' main point after all: too many poor papers get through the reviewing process as it stands :-) Perhaps that is a bit harsh - the authors are looking at the problem from the point of view of the quality of the resulting journal and the number of poor papers it is likely to contain. Of course, due to their initial assumption that the vast majority of submitted manuscripts are poor (which I suspect is substantially exaggerated), the number that make it through is quite high, and increasing the stringency of the review will help to cut down this proportion. But this comes at quite a cost (in terms of rejecting good research) which is not spelt out at all clearly in the paper.


Adam said...

If you are looking at the perceived quality of the journal involved, then it probably is best to kill a few innocents at the expense of the guilty getting through.

However, would it be best to have a hierarchy of journals, where the review process is loosened (up to a specified point) as you go down the pyramid - but the focus of the journal tightens, e.g. becomes more specialist at the same time?

That way, very high-quality work would be published at a (more) general level where cross-discipline usefulness requires people to know that it's good work, whereas in the specialist journals, potentially flawed papers would still see the light of day so they could get discussed and corrected and any useful (or wrongfully rejected high quality) work would still see the light of day.

I think some people may well claim that this is the current situation and it may well be so to a certain extent. Others may disagree. ;)

James Annan said...


I think that is exactly what we do have now - and it works reasonably well. Just about everything that isn't horribly flawed can find a home, but the bad stuff (a) takes a long time to get through review (which in itself keeps the quality up, since top scientist A may publish 4 papers in the time that weak scientist B has to revise and resubmit his paper several times) and (b) ends up in journals with lower reputations and probably a lesser readership. Actually, it's not really a case of specialised v general, the two general journals (Science + Nature) are a bit of an exception and basically everything else appears in specialist journals which have a fairly clear hierarchy. Of course, we can still talk sensibly about the plusses and minuses of tinkering with the details in terms of raising or lowering the bar.