Financial Times FT.com

The league table's hidden flaw

By Robert Matthews

Published: December 2 2005 02:00 | Last updated: December 2 2005 02:00

Is your local school under-performing and your nearest hospital a death trap? Once upon a time, the only way to find out was via the local rumour mill. No longer: today we can turn for answers to the plethora of statistics, league tables and targets produced by government departments.

Or at least, that is what government ministers would have us believe. In reality, the reputation of governments for massaging official statistics has undermined trust in what should be a reliable source of insight into the effectiveness of their policies. As a recent survey by the Office for National Statistics (ONS) showed, most of us are happy to trust the raw data: it is the idea of the data being "cooked" by ministers that we find unpalatable.

This week, Gordon Brown, the chancellor, sought to boost trust in official statistics by announcing plans to make the ONS wholly independent of government. The move has been widely welcomed, not least because it boosts evidence-based policy, where progress is assessed using facts rather than anecdote, guesswork or spin.

Yet it does nothing to address a more fundamental concern: the dangers of pushing even unimpeachable data too far. For, while quantitative targets and performance indicators may seem like an advance on vague aspirations, their apparent clarity is an illusion.

Statisticians have long warned of the threat this poses to the reliability of decision-making by ministers, business and the public alike. So beguiling is the illusion, however, that these warnings have so far fallen on deaf ears.

Much of the concern has focused on that most high-profile of statistical indicators: performance "league tables". Introduced by the Conservative government more than a decade ago, the tables were trumpeted as a source of reliable performance data allowing parents and patients to make informed choices.

Initial criticism - principally from teaching unions - focused on the lack of any attempt to incorporate factors such as the socio-economic backgrounds of the pupils. But statisticians warned of a more basic flaw. In ranking each institution, the league tables take a single number - such as exam grade performance - as a measure of performance. While making the tables easy to understand, it also hides the fact that each figure is based on a limited sample, and thus inherently uncertain. And just like opinion polls, the smaller the sample, the greater the uncertainty surrounding the measured performance level.

A more realistic picture of performance emerges by following the practice of opinion pollsters and stating this uncertainty using "plus or minus" error bars. These are calculated, using standard statistical methods, from the size of the sample and added to the basic figure. This will show the range of values within which the true performance figure could plausibly lie.

Shortly after the launch of the league tables, Professor Harvey Goldstein of the Institute of Education at the University of London worked out the error bars for the rating of 10 schools in one local authority. With only a relatively small number of pupils taking examinations each year, the resulting error bars for their performance are wide - so wide that they overlapped those of every other school, making a mockery of any attempt at ranking them.

Ministers responded to the initial criticisms by introducing "value-added" factors, reflecting the success of schools in improving pupil performance. Yet this still fails to deal with the small-sample-size problem. Using exam data from more than 300 schools and colleges, Prof Goldstein has shown that the resulting error bars are so wide that league tables demonstrate only that the top 15 per cent of schools are better than the bottom 15 per cent - which was obvious long before the in-troduction of league tables.

The sample-size problem has since been found to undermine league tables for other institutions whose performance is calculated from small numbers, such as fertility clinics. In common with school league tables, they often show dramatic changes in rankings. These are often taken to signal dramatic change in performance. In reality, they are merely expected random variation in the quoted performance level - an effect that would be made clear if error bars were included.

Government departments have rejected the use of error bars on the grounds that they may cause confusion. Yet such indications of error have long featured in reports of opinion poll findings, with media reports describing differences within the stated margin of error as being "too close to call".

In contrast, some performance targets set by government suggest no lack of confusion on statistical issues among ministers. Following stories about hospital patients dying from so-called superbug infections, John Reid, health secretary at the time, announced a year ago that hospitals would be required to cut rates of infection each year, with the aim of producing a 50 per cent cut by 2008. The target was described by Mr Reid as "achievable, measurable and not too burdensome", and now forms part of the "star rating" system for assessing NHS trusts.

Equivalent to an annual reduction of about 20 per cent, it does indeed seem pretty modest. Yet trust managers are likely to find the target hard to meet.

As with league table rankings, part of the reason is small sample size. In spite of the headlines, most hospital trusts see only a few dozen cases of superbug infection each year. Simple random variation is thus easily capable of producing dramatic changes in infection rates if measured annually as the government demands.

But the target also ignores the fact that infections tend to come in outbreaks, producing surges in patient numbers. Combined with simple random variation, this can easily produce enough extra cases to swamp any genuine im-provement. According to an analysis published last month in the British Medical Journal, NHS trusts that meet the government target have only a 50:50 chance of their achievement surviving the vagaries of chance.

At the same time, hospital trusts failing to make any genuine reduction in infection rates can fluke their way past the government target. According to the BMJ analysis, by David Spiegelhalter of the Medical Research Council's Biostatistics Unit, Cambridge, a typical trust with 32 superbug cases each year has a 25 per cent chance of meeting the government target without achieving real improvement.

Dr Spiegelhalter identified a series of other pitfalls, ranging from the precise definition of rates to the phenomenon of regression to the mean (see left). Along with the small-sample-size problem, these have relevance for policy in many other areas. Targets established to tackle headline-grabbing issues are especially vulnerable to statistical effects. By focusing on serious but unusual threats - such as MRSA - such targets will involve relatively small numbers that can fluctuate wildly year on year.

Rapid action by ministers over some headline-grabbing threat can also prove misguided - and expensive. A spate of deaths among teenagers from meningitis in 1999 prompted the Department of Health to introduce mass vaccination for one type of the disease. Infection rates duly plunged, prompting the department to hail the £25m programme "a great victory for the NHS". Most of the reduction is now believed to be due to the statistical effect of regression to the mean, in which sudden peaks or troughs revert to levels closer to the long-term mean.

Some motoring organisations claim the same effect explains the alleged life-saving benefits of speed cameras. These are sited at accident black spots, which by definition have unusually high rates of accidents. Ministers have been keen to ascribe any subsequent drop in casualty numbers to the presence of the cameras. It may, however, be nothing more than an anomalously high rate regressing to its long-term mean.

Any government worth its salt needs to set meaningful targets based on reliable statistics. But it will take more than this week's move to make the ONS independent to break the link between lies, damned lies and official statistics.

The writer is visiting reader in science at Aston University, Birmingham