In defence of statistics

There are a lot of statistics thrown around in the copyright wars. Take this example from the International Federation of the Phonographic Industry (IFPI):

In total, 3.1 million more people were using peer-to-peer networks in March 2002 than in February 2001 when Napster was at its peak. CD burning has also badly hit the European music sector. In Germany, the number of blank CDs used to burn music was estimated at 182 million in 2001, compared to 185 million CD album sales, according to a survey from March 2002 by market research firm Gfk. In Spain, 71 million albums were sold in 2001 compared to an estimated 52 million blank CDs used to burn music, according to a survey by Millward Brown/Alef.

Most people's eyes glaze over at a passage like this, but we still generally accept the statistics without question. Alternatively, we resort to labelling them ‘lies, damned lies and statistics’ if they do not support our point of view.

In defence of statistics, however, it is just number crunching according to pre-determined, long-established rules. The ‘lies’, as perceived, tend to come from unreliable surveys producing unreliable data; also from the interpretation or selective use of the results once the numbers have been crunched. So the lies tend to follow from abuse of, rather than use of, statistics.

Not all ‘lies’ arise from a deliberate manipulation of statistics. Sometimes it happens by accident as a result of a misunderstanding. Suppose that a survey says, ‘The number of artists signed each year by the major music labels has doubled since 1970.’ Somebody else interprets that as, ‘Every year since 1970, the number of artists signed by major labels has doubled.’ In the first case, if there was one artist signed in 1970, there would have been two signed in 2002. In the second case, if there was one artist signed in 1970, there would be two in 1971, four in 1972, eight in 1973 and so on, to a huge number in 2002. The statistic has been distorted but people still accept it.

In the original T182 course, by kind permission of the BBC, we showed students a couple of short sequences from the television series Yes Prime Minister by Jonathan Lynn and Antony Jay, which gave a useful illustration of the abuse of statistics:

In the first sequence, part of the episode entitled ‘The Grand Design’, the two civil servants, Sir Humphrey Appleby and Bernard Wooley, are discussing surveys. Through a series of cleverly loaded questions, Humphrey tricks Bernard into agreeing that he is both for and against reintroducing national service, hence demonstrating that we should be careful about relying on surveys.

In the second sequence, part of the episode called ‘The Smokescreen’, the Prime Minister is proposing to pursue a campaign to reduce smoking. Sir Humphrey is attempting to dissuade him from such a course of action. The Prime Minister for once gets the upper hand by demonstrating that Sir Humphrey refers to any numbers that support his argument as ‘facts’ and any numbers that undermine it as ‘statistics’, implying that such numbers are unreliable.

I used another word above when talking about statistics: ‘data’. My thesaurus suggests that data is another word for facts. That does not really fit here. Thinking about Bernard's answers to Sir Humphrey's questions in the first clip as data may help. Each ‘yes’ is a piece of data. We can consider data as the building blocks for ‘facts’. So if we use data selectively or put it together in a particular way, the building that becomes our facts can be made to look as we want it to look – another tactic in the copyright wars.

None of this changes the second rule of critical analysis:

Be sceptical. Question and interpret with care statistics used to support a particular perspective or set of values.

Although the underlying mathematics may be sound, statistics tend to be wielded to support particular agendas, so treat them with care.

"Carlyle said ‘A lie cannot live’; it shows he did not know how to tell them."

(Mark Twain)

