Exploring data: Graphs and numerical summaries

This free course is available to start right now. Review the full course description and key learning outcomes and create an account and enrol if you want a free statement of participation.

Free course

# 1.4.2 Measures of location

Everyone professes to understand what is meant by the term ‘average’, in that it should be representative of a group of objects. The objects may well be numbers from, say, a batch or sample of measurements, in which case the average should be a number which in some way characterises the batch as a whole. For example, the statement ‘a typical adult female in Britain is 160 cm tall’ would be understood by most people who heard it. Obviously not all adult females in Britain are the same height: there is considerable variation. To state that a ‘typical’ height is 160 cm is to ignore the variation and summarise the distribution of heights with a single number. Even so, it may be all that is needed to answer certain questions. (For example, is a typical adult female shorter than a typical adult male?)

But how should this representative value be chosen? Should it be a typical member of the group or should it be some representative measure which can be calculated from the collection of individual data values? Believe it or not, there are no straightforward answers to these questions. In fact, two different ways of expressing a representative value are commonly used in statistics, namely the median and the mean. The choice of which of these provides the better representative numerical summary is fairly arbitrary and is based entirely upon the nature of the data themselves, or the particular preference of the data analyst, or the use to which the summary statement is to be put. The median and the mean are both examples of measures of location of a data set; here the word ‘location’ is essentially being used in the sense of the position of a typical data value along some sort of coordinate axis.

We deal with the median and the mean in turn, as well as considering the concept of the mode of a data set. (In a sense the mode is another measure of location.)

M248_1