Measuring location has two components:
gathering data about the quantity of interest
determining a value to represent the location of the data.
The task of gathering appropriate data is somewhat problem-specific – general strategies are available, but exact details usually need to be decided for each problem. To determine the price of an electric kettle, for example, we would have to decide the size and type of kettle we’re interested in, where and when its purchased, and so forth. In contrast, choosing a value to summarise the location of a set of data is more straightforward. In this section, we will focus on the two most common measures of location: the median and the mean. The data gathered about the quantity of interest does not affect the way we calculate these location measures.
In order to measure how prices change, we need data on prices and some way of measuring their overall location. Price data take many forms.
In examining the overall location, prices of all goods are relevant, but some are more important than others. Ballpoint pens are relatively unimportant in most people’s shopping baskets, coffee prices are unimportant for tea drinkers, and chicken prices are of little concern to vegetarians. The first batch of price data we will look at is coffee prices.
Table 1 shows prices of a 100 g jar of a well-known brand of instant coffee obtained in 15 different shops in Milton Keynes on the same day in February 2012.
Table 1 Coffee prices (in pence)
299 |
315 |
268 |
269 |
295 |
295 |
369 |
275 |
268 |
295 |
279 |
268 |
268 |
295 |
305 |
There are several points to note concerning these prices.
They relate to a particular brand of coffee. You might expect the price to vary between brands.
They relate to a standard 100 g jar. You might expect the price per gram of this brand of coffee to vary depending upon the size of the jar – larger jars are often cheaper (per gram).
They relate to a particular locality. You might expect the price to vary depending upon where you buy the coffee (e.g. central London, a suburb, a provincial town, a country village or a Hebridean island).
They relate to a particular day. You might expect the price to vary from time to time depending upon changes in the cost of raw coffee beans, costs of production and distribution, and the availability of special offers.
Nevertheless, although we have data for a fixed brand of coffee, size of jar, locality and date of purchase, this batch of
prices still varies from the lower extreme of 268p to the upper extreme of 369p. (In symbols: and
.) One of the most likely reasons for this is that the prices were collected from different kinds of shops (e.g. supermarket,
petrol station, ethnic grocery and corner shop).
For all these reasons, it is impossible to state exactly what the price of this brand of instant coffee is. Yet its price is, in its own small way, relevant to the question: Are people getting better or worse off? That is, if you drink this particular coffee, then changes in its price in your locality will affect your cost of living. Similarly, your costs and economic well-being will also be affected by what happens to the prices of all the other things you need or like to consume.
On the other hand, someone who never buys instant coffee will be unaffected by any change in its price; they will be much more interested in what happens to the prices of alternative products such as ground coffee, tea, milk or fruit juice. The problem of measuring the effect of price changes on individuals with different consumption patterns will be considered in Section 5.
Despite the variability in the data, Table 1 does provide some idea of the price you would expect to pay for a 100 g jar of that particular instant coffee in the Milton Keynes area on that particular day. The information provided by the batch can be seen more clearly when drawn as a stemplot, shown in Figure 1 of Example 2.
Figure 1 Stemplot of coffee prices from Table 1
This stemplot shows at a glance that if you shop around, you might well find this brand of coffee on sale at less than 270p. (Indeed some stores seem to have been ‘price matching’ at the lowest price of 268p.) On the other hand, if you are not too careful about making price comparisons then you might pay considerably more than 300p (£3). However, you are most likely to find a shop with the coffee priced between about 270p and 300p. Although there is no one price for this coffee, it seems reasonable to say that the overall location of the price is a bit less than 300p.
The median of the batch is a useful measure of the overall location of the values in a batch. It is defined as the middle value of a batch of figures when the values are placed in order. Let us examine in more detail what that means.
The stemplot in Figure 1 shows the prices arranged in order of size. We can label each of these 15 prices with a symbol indicating
where it comes in the ordered batch. A convenient way of showing this is to write each value as the symbol plus a subscript number in brackets, where the subscript number shows the position of that value within the ordered batch.
Figure 2 shows the 15 prices written out in ascending order using this subscript notation.
Figure 2 Subscript notation for ordered data
The lower extreme, , is labelled
and the upper extreme,
, is labelled
. The middle value is the value labelled
since there are as many values, namely 7, above the value of
as there are below it. (This is not strictly true here, since the values of
,
and
happen also to be actually equal to the median.)
This is illustrated in Figure 3 by a V-shaped formation. The median is the middle value, so it lies at the bottom of the V. (This way of picturing a batch will be developed further in Subsection 3.2.)
Figure 3 Median of 15 values
If you wanted to make a more explicit statement, then you could write: The median price of this batch of 15 prices is 295p.
If we picture any batch of data as a V-shape like Figure 3, the median of the batch will always lie at the bottom of the V. In the ordered batch, it is more places away from the extremes than any other value.
In general, the median is the value of the middle item when all the items of the batch are arranged in order. For a batch
size , the position of the middle value is
. For example, when
, this gives a position of
, indicating that
is the median value. When
is an even number, the middle position is not a whole number and the median is the average of the two numbers either side
of it. For example, when
, the median position is
, indicating that the median value is taken as halfway between
and
.
Example 3 uses prices of a digital camera to illustrate how the median is found for an even number of values.
Table 2 shows prices for a particular model of digital camera as given on a price comparison website in March 2012.
Table 2 Prices for a digital camera (to the nearest £)
60 |
70 |
53 |
81 |
74 |
85 |
90 |
79 |
65 |
70 |
If we put these prices in order and arrange them in a V-shape, they look like Figure 4.
Figure 4 Prices of 10 digital cameras
Because 10 is an even number, there is no single middle value in this batch: the position of the middle item is . The two values closest to the middle are those shown at the bottom of the V:
and
. Their average is 72, so we say that the median price of this batch of camera prices is £72.
The following activity asks you to find the median for an even number of values, using a stemplot of prices for small flat-screen televisions.
Figure 5 is a stemplot of data on the prices of small flat-screen televisions. (The prices have been rounded to the nearest £10. Originally all but one ended in 9.99, so in this case it makes reasonable sense to ignore the rounding and treat the data as if the prices were exact multiples of £10.) Find the median of these data.
Figure 5 Prices of all flat-screen televisions with a screen size of 24 inches or less on a major UK retailer’s website on a day in February 2012
This subsection can now be finished by using some of the methods we have met to examine a batch of data consisting of two parts, or sub-batches.
Table 3 presents the average price of gas, in pence per kilowatt hour (kWh), in 2010, for typical consumers on credit tariffs in 14 cities in the UK. These cities have been divided into two sub-batches: as seven northern cities and seven southern cities. (Legally, at the time of writing, Ipswich is a town, not a city, but we shall ignore that distinction here.)
Table 3 Average gas prices in 14 cities
Northern cities | Average gas price (pence per kWh) | Southern cities | Average gas price (pence per kWh) |
---|---|---|---|
Aberdeen |
3.740 |
Birmingham |
3.805 |
Edinburgh |
3.740 |
Canterbury |
3.796 |
Leeds |
3.776 |
Cardiff |
3.743 |
Liverpool |
3.801 |
Ipswich |
3.760 |
Manchester |
3.801 |
London |
3.818 |
Newcastle-upon-Tyne |
3.804 |
Plymouth |
3.784 |
Nottingham |
3.767 |
Southampton |
3.795 |
(a) Draw a stemplot of all 14 prices shown in the table.
(b) Draw separate stemplots for the seven prices for northern cities and the seven prices for southern cities.
(c) For each of these three batches (northern cities, southern cities and all cities) find the median and the range. Then use these figures to find the general level and the range of gas prices for typical consumers in the country as a whole, and to compare the north and south of the country.
Activity 2 illustrates two general properties of sub-batches:
The range of the complete batch is greater than or equal to the ranges of all the sub-batches.
The median of the complete batch is greater than or equal to the smallest median of a sub-batch and less than or equal to the largest median of a sub-batch.
Another important measure of location is the arithmetic mean. (Pronounced arithmetic.)
The arithmetic mean is the sum of all the values in the batch divided by the size of the batch. More briefly,
There are other kinds of mean, such as the geometric mean and the harmonic mean, but in this course we shall be using only the arithmetic mean; the word mean will therefore normally be used for arithmetic mean.
Suppose we have a batch consisting of five values: 4, 8, 4, 2, 9. In this simple example, the mean is
Note that in calculating the mean, the order in which the values are summed is irrelevant.
For a larger batch size, you may find it helpful to set out your calculations systematically in a table. However, in practice the raw data are usually fed directly into a computer or calculator. In general, it is a good idea to check your calculations by reworking them. If possible, use a different method in the reworking; for example, you could sum the numbers in the opposite order.
The formula ‘’ can be expressed more concisely as follows. Referring to the values in the batch by
, the ‘sum’ can be written as
. Here
is the Greek (capital) letter Sigma, the Greek version of S, and is used in statistics to denote ‘the sum of’. Also, the
symbol
is often used to denote the mean – and as you have already seen in stemplots,
can be used to denote the batch size. (Some calculators use keys marked
and
to produce the sum and the mean of a batch directly.)
Using this notation,
can be written as
In this course we shall normally round the mean to one more figure than the original data.
The prices of 20 small televisions were given in Activity 1 (Subsection 1.2). Find the mean of these prices. Round your answer appropriately (if necessary), given that the original data were rounded to the nearest £10.
Both the mean and median of a batch are useful indicators of the location of the values in the batch. They are, however, calculated in very different ways. To find the median you must first order the batch of data, and if you are not using a computer, you will often do the sorting by means of a stemplot. On the other hand, the major step in finding the mean consists of summing the values in the batch, and for this they do not need to be ordered.
For large batches, at least when you are not using a computer, it is often much quicker to sum the values in the batch than it is to order them. However, for small batches, like some of those you will be analysing in this course without a computer, it can be just as fast to calculate the median as it is to calculate the mean. Moreover, placing the batch values in order is not done solely to help calculate the median – there are many other uses. Drawing a stemplot to order the values also enables us to examine the general shape of the batch. In Section 3 you will read about some other uses of the stemplot.
Comparisons based on the method of calculation can be of great practical interest, but the rest of this subsection will consider more fundamental differences between the mean and the median – differences which should influence you when you are deciding which measure to use in summarising the general location of the values in a batch.
Many of the problems with the mean, as well as some advantages, lie in the fact that the precise value of every item in the batch enters into its calculation. In calculating the median, most of the data values come into the calculation only in terms of whether they are in the 50% above the median value or the 50% below it. If one of them changes slightly, but without moving into the other half of the batch, the median will not change. In particular, if the extreme values in the batch are made smaller or larger, this will have no effect on the value of the median – the median is resistant to outliers. In contrast, changes to the extremes could have an appreciable effect on the value of the mean, as the following examples show.
For the batch of coffee prices in Figure 1 (Subsection 1.2), the sum of the values is 4363p, so the mean is
Suppose the highest and lowest coffee prices are reduced so that
The median of this altered batch is the same as before, 295p. However, the sum of the values is now 4306p and so the mean is
Suppose the highest two television prices in Activity 1 (Subsection 1.2) are altered to £350 and £400. The median, at £150, remains the same as that of the original batch, whereas the new mean is
compared with the original mean of £162.
Now, even with the very high prices of £350 and £400 for two televisions, the overall location of the main body of the data is still much the same as for the original batch of data. For the original batch the mean, £162, was a reasonably good measure of this. However, for the new batch the mean, £174, is much too high to be a representative measure since, as we can see from the stemplot in Activity 1, most of the values are below £174.
Example 6 is the subject of the following screencast. [Note that the reference to ‘Unit 2’ should be ‘this course’. Unit 2 is a reference to the Open University course from which this material is adapted.]
Video content is not available in this format.
Screencast 1 Effects on the median and mean when data points change
A measure which is insensitive to changes in the values near the extremes is called a resistant measure.
The median is a resistant measure whereas the mean is sensitive.
In the following activities, you can investigate some other ways in which the median is more resistant than the mean.
In Activity 2 (Subsection 1.2) you may have noticed that Cardiff and Ipswich had rather low gas prices compared to the other southern cities. Here you are going to examine the effect of deleting them from the batch of southern cities. Complete the following table and comment on your results.
Batch | Mean | Median |
---|---|---|
Seven southern cities |
|
|
Five southern cities (excluding Cardiff and Ipswich) |
|
|
Suppose the value for London had been misprinted as 8.318 instead of 3.818 (quite an easy mistake to make!). How would this affect your results for the batch of five southern cities (again omitting Cardiff and Ipswich)?
Batch | Mean | Median |
---|---|---|
Five cities (correct data) |
|
|
Five cities (with misprint) |
|
|
Suppose you wanted to use these values – the correct ones, of course – to estimate the average price of gas over the whole country. The simple arithmetic mean of the 14 values given in Table 3 (Subsection 1.2) would not allow for the fact that much more gas is consumed in London, at a relatively high price, than in other cities. To take account of this you would need to calculate what is known as a weighted arithmetic mean. Weighted means are the subject of Section 2.
The following exercises provide extra practice on the topics covered in Section 1.
For each of the following batches of data, find the median of the batch. (We shall also use these batches of data in some of the exercises in Section 3.)
(a) Percentage scores in arithmetic obtained by 33 school students.
Figure 8
(b) Prices of 26 digital televisions with 22- to 26-inch LED screens, quoted online by a large department store in February 2012. The prices have been rounded to the nearest pound (£).
170 |
180 |
190 |
200 |
220 |
229 |
230 |
230 |
230 |
230 |
250 |
269 |
269 |
270 |
279 |
299 |
300 |
300 |
315 |
320 |
349 |
350 |
400 |
429 |
649 |
699 |
Calculate the mean for each of the batches in Exercise 1.
In the data on prices for small televisions in Activity 1 (Subsection 1.2), the three highest-priced televisions were considerably more expensive than all the others (which all cost under £200). Suppose that in fact these prices had been for a different, larger type of television that should not have been in the batch. (In fact that is not the case – but this is only an exercise!) Leave these three prices out of the batch and calculate the median and the mean of the remaining prices.
How do these values compare with the original median (150) and mean (162)? What does this comparison demonstrate about how resistant the median and mean are?