Underperforming on performance

What is the collective noun for indicators of public service performance? A thicket? A fudge? Whatever it may be, the British government has announced yet another league table, this time packed with indicators of public safety in English National Health Service hospitals.

Logging on to the NHS Choices website, I discover my local hospital is “among the worst” as far as “infection control and cleanliness” are concerned. The website adds that all “Care Quality Commission national standards” have been met. This is baffling. The hospital is filthy yet meets all care quality standards? Maybe the collective noun should be “a contradiction of indicators”.

“I think we’re getting a bit overwhelmed now with these packages of indicators,” says John Appleby, chief economist of the King’s Fund, a healthcare think-tank. “As a patient, I wouldn’t know what to make of these at all.”

If they are merely useless and confusing, that’s one thing. But some indicators in the past have caused serious collateral damage. Consider two examples from either side of the Atlantic.

In the UK in the late 1990s, Tony Blair’s government set a range of targets for how quickly ambulances should respond to emergency calls. In an “immediately life-threatening” case in an urban area, first responders should arrive within eight minutes, three-quarters of the time. The target swiftly backfired. By 2003 the data were showing odd patterns – for one ambulance service, more than 900 calls were recorded as having been met in seven minutes and 59 seconds, with just a handful met in eight minutes. The definition of “immediately life-threatening” mysteriously varied by a factor of five from one ambulance service to the next. Crews were split and given bikes or small cars, allowing a lone paramedic on a bike to hit a target, even if he couldn’t take you to hospital.

In the US, “report cards” provide data on the performance of cardiac surgeons and cardiac wards. David Dranove, Daniel Kessler, Mark McClellan and Mark Satterthwaite, four economists who studied the report cards, found a most unwelcome consequence: doctors resisted operating on the severely ill and favoured surgery for patients who might not even need it. A healthy patient is a strong candidate to thrive after heart surgery, no?

None of this should surprise. There are three ways to improve your score on any performance metric: first, actually improve performance; second, focus on ways to look good on the metric in question; third, cheat.

That said, surely performance metrics can sometimes identify and encourage what’s best in public service. What might help is a sense of who is supposed to use these metrics, and how they might react.

Gwyn Bevan of the London School of Economics suggests four models of public service. In “trust and altruism”, noble doctors and teachers always do their best, and indicators help them do their jobs. In “targets and terror”, public servants are assumed to be selfish, whipped into shape by a central government with a dashboard of performance data. In the “quasi-market” system, the indicators are provided to the public, who act as consumers and choose their preferred school or hospital. Finally, “name and shame” uses league tables to humiliate losers and lionise winners.

None of these four systems is obviously absurd, so what does the evidence suggest? Devolution in the UK provides an interesting natural experiment. The Welsh government abolished school league tables and the Scottish government eschewed targets for hospital waiting times. In both cases, researchers from Bristol University and elsewhere showed that the English system worked better. This supports “name and shame” (for schools) and “targets and terror” (for hospitals). It is bad news for the “trust and altruism” model.

We know that true markets often work well but there are question marks over the effectiveness of “quasi-markets” for education and healthcare. The British state education system consists not of families choosing the best schools but of good schools choosing the best families, while bad schools chug along without going out of business. Americans may be savvy consumers of cars or phones but appear to pay little attention to publicly available evidence on the quality of hospital care.

“Name and shame” is the idea that indicators work not because they inform bureaucratic overseers, nor because they help consumers pick the best services, but simply because nobody wants the embarrassment of propping up the bottom of a league table. It seems a crude approach but an influential research paper by Judith Hibbard, Jean Stockard and Martin Tusler found evidence that “name and shame” might work.

Hibbard and her colleagues studied how Wisconsin hospitals reacted to a report on quality of care. Some of the hospitals were included in a widely disseminated quality evaluation. Others, chosen at random, received a confidential report on their own performance – the ideal approach for a world of “trust and altruism”. A third group of hospitals received no report at all.

Hibbard’s research suggested that Wisconsin healthcare did not function as a regular market. Poorly performing hospitals were not afraid of losing market share, and rightly so. But they did make substantial efforts to improve, nonetheless – citing a concern for their reputation.

Perhaps we have that collective noun after all: it’s an “embarrassment of indicators”.

Tim Harford’s latest book, ‘The Undercover Economist Strikes Back’, is now available in paperback (Little, Brown); Twitter: @TimHarford

Illustration by Harry Haysom

Copyright The Financial Times Limited 2017. All rights reserved. You may share using our article tools. Please don't cut articles from FT.com and redistribute by email or post to the web.