The UK periodically goes through a Research Excellence Framework (Ref) exercise, assessing research over a five-year period, currently 2008-2013. Every submission from across the country is assessed on its quality, with an important criterion being the prestige of the journal and the rigour of its peer-review process. Peer reviewers are researchers with established reputations. But while these peers can provide careful and reasoned evaluations of work, a common complaint is that these same peers are biased toward accepting work that reinforces their own ideas, thus stifling innovative thinking.
This argument ignores the fact that the vast majority of published work is in any case not particularly innovative. The papers are well done, but form small extensions of existing ideas and in most cases are hardly cited at all. A small set of papers account for the overwhelming majority of citations. Given the sheer number of submissions and publications, the peer review process (which usually consists of anonymous unpaid academics reviewing papers on their own time as a service to the journal) does not always recognise these papers ex ante. The task of finding an innovative paper is akin to finding a rough diamond in a sea of pebbles. The anonymity of the peer review process works against incentives to publish new and innovative research. Since the referee will not get credit for championing a new idea, there is little incentive to stick his/her neck out and pass a paper that might contain an error.
In most cases, recognising innovation is a job for the gatekeepers to the peer review process – editors of journals – who are not anonymous and whose journals’ reputations essentially rise and fall along with their citation counts. Editors (whose reputations in turn are influenced by the reputations of their journals) are strongly motivated to seek out work that steps away from where research is currently being conducted. No journal wants to reject the next Black-Scholes paper or the Akerlof Lemons paper, both of which were rejected on their first submissions. However, no journal wants to publish the next cold fusion paper either; a paper that claimed to have observed nuclear fusion processes at room temperature, which generated great excitement when first published, but whose results were never replicated and ultimately dismissed.
The peer review process still remains a necessary one. There is a surprisingly good chance that research results that push boundaries will not stand scrutiny due to what is termed collective unintentional data mining. A scholar can approach a data set or experiment in a multitude of ways. That is not, in itself, a problem since there is no single obvious path or method appropriate for every situation. However an unethical scholar can build a paper on the one approach that happened to generate a positive (publishable) result. In the entirely ethical version, every scholar acts correctly and takes only one shot at the data. However, negative results are much more difficult to publish than positive ones since the peer reviewer is easily likely to dismiss the paper on the grounds that the authors just did not look hard enough for positive results. Hence, an author with a negative result is likely to abandon the paper, rather than submit it to a journal. Thus with a multitude of scholars, each choosing a slightly different approach and each opting to submit only positive results, the end result of the process is the same – an artefact of the choices rather than a profound truth.
Is this unintentional data mining a big problem? Yes. It is astounding how many results do not seem to stand the test of time. This is not just a problem in social sciences which rely upon econometric techniques applied to real data, but even in more experimental sciences. There is almost a cottage industry of researchers devoted to replicating earlier work and being unable to do so.
So when reviewing new research, how do editors balance innovation and ensure robustness? Experience helps. Having seen a great deal of work, editors can often recognise a fresh approach or idea and most are willing to back new ideas if they see no obvious concern. For the work where concerns might be less obvious, the best way to assess the fragility of a study is to ask for some test variants and gauge sensitivity. Editors do not strive for certainty – if a paper survives a reasonable “what about this” hurdle, that is usually enough for most journals. The review process should identify related work that might suggest potential problems. It is not uncommon to ask authors to reconcile their results with another study – to identify a reason for the difference. In fact, new insights very often arise precisely when reconciling conflicting results or theories.
At the end, the question is how many false insights are academics willing to accept as a result of the data mining problem in order to ensure that innovative ideas are not weeded out? Enlightened editors have the responsibility of identifying diamonds in the rough, encouraging them and even actively engaging in the polishing themselves. However, the growing alarm about replication of results suggests that, taken as a whole, there may be as much reason to be careful as carefree.
The Ref and other similar exercises are not meant to identify innovative ideas. Through their focus on journal prestige and the quality of the peer review process, they emphasise well-done research – rigorous research that is free of methodological flaws. However, research rigour is a necessary but not sufficient condition for innovation.
Raghavendra Rau is the Sir Evelyn de Rothschild professor of finance, University of Cambridge and co-editor Financial Management. Marc Lipson is the Robert F. Vandell research professor, University of Virginia and co-editor Financial Management.