Skip to content
Skip to main content

About this free course

Download this course

Share this free course

Prices, location and spread
Prices, location and spread

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

1.4 The mean and median compared

Both the mean and median of a batch are useful indicators of the location of the values in the batch. They are, however, calculated in very different ways. To find the median you must first order the batch of data, and if you are not using a computer, you will often do the sorting by means of a stemplot. On the other hand, the major step in finding the mean consists of summing the values in the batch, and for this they do not need to be ordered.

For large batches, at least when you are not using a computer, it is often much quicker to sum the values in the batch than it is to order them. However, for small batches, like some of those you will be analysing in this course without a computer, it can be just as fast to calculate the median as it is to calculate the mean. Moreover, placing the batch values in order is not done solely to help calculate the median – there are many other uses. Drawing a stemplot to order the values also enables us to examine the general shape of the batch. In Section 3 you will read about some other uses of the stemplot.

Comparisons based on the method of calculation can be of great practical interest, but the rest of this subsection will consider more fundamental differences between the mean and the median – differences which should influence you when you are deciding which measure to use in summarising the general location of the values in a batch.

Many of the problems with the mean, as well as some advantages, lie in the fact that the precise value of every item in the batch enters into its calculation. In calculating the median, most of the data values come into the calculation only in terms of whether they are in the 50% above the median value or the 50% below it. If one of them changes slightly, but without moving into the other half of the batch, the median will not change. In particular, if the extreme values in the batch are made smaller or larger, this will have no effect on the value of the median – the median is resistant to outliers. In contrast, changes to the extremes could have an appreciable effect on the value of the mean, as the following examples show.

Example 5 Changing the extreme coffee prices

For the batch of coffee prices in Figure 1 (Subsection 1.2), the sum of the values is 4363p, so the mean is

fraction 4363 p over 15 end simeq 290.9 p .

Suppose the highest and lowest coffee prices are reduced so that

x subscript open bracket 1 close bracket end = 240 and x subscript open bracket 15 close bracket end = 340.

The median of this altered batch is the same as before, 295p. However, the sum of the values is now 4306p and so the mean is

fraction 4306 p over 15 end simeq 287.1 p .

Example 6 Changing the small television prices

Suppose the highest two television prices in Activity 1 (Subsection 1.2) are altered to £350 and £400. The median, at £150, remains the same as that of the original batch, whereas the new mean is

fraction pounds 3470 over 20 end = pounds 173.5 simeq pounds 174

compared with the original mean of £162.

Now, even with the very high prices of £350 and £400 for two televisions, the overall location of the main body of the data is still much the same as for the original batch of data. For the original batch the mean, £162, was a reasonably good measure of this. However, for the new batch the mean, £174, is much too high to be a representative measure since, as we can see from the stemplot in Activity 1, most of the values are below £174.

Example 6 is the subject of the following screencast. [Note that the reference to ‘Unit 2’ should be ‘this course’. Unit 2 is a reference to the Open University course from which this material is adapted.]

Download this video clip.Video player: Effects on the median and mean when data points change
Copy this transcript to the clipboard
Print this transcript
Show transcript|Hide transcript
Screencast 1 Effects on the median and mean when data points change
Interactive feature not available in single page view (see it in standard view).

A measure which is insensitive to changes in the values near the extremes is called a resistant measure.

The median is a resistant measure whereas the mean is sensitive.

In the following activities, you can investigate some other ways in which the median is more resistant than the mean.

Activity 4 Changing the gas prices

In Activity 2 (Subsection 1.2) you may have noticed that Cardiff and Ipswich had rather low gas prices compared to the other southern cities. Here you are going to examine the effect of deleting them from the batch of southern cities. Complete the following table and comment on your results.

Batch Mean Median

Seven southern cities

 

Five southern cities (excluding Cardiff and Ipswich)

Discussion

The completed table is:

Batch Mean Median

Seven southern cities

3.7859

3.795

Five southern cities (excluding Cardiff and Ipswich)

3.7996

3.796

Whereas deletion of Cardiff and Ipswich has the effect of increasing the mean price by 0.0137p per kWh, the median price increases by only 0.001p per kWh. This is what we would expect as, in general, the more resistant a measure is, the less it changes when a few extreme values are deleted.

Activity 5 A misprint in the gas prices

Suppose the value for London had been misprinted as 8.318 instead of 3.818 (quite an easy mistake to make!). How would this affect your results for the batch of five southern cities (again omitting Cardiff and Ipswich)?

Batch Mean Median

Five cities (correct data)

Five cities (with misprint)

Discussion

The completed table is:

Batch Mean Median

Five cities (correct data)

3.7996

3.796

Five cities (with misprint)

4.6996

3.796

Here the median is completely unaffected by the misprint, although the mean changes considerably.

Suppose you wanted to use these values – the correct ones, of course – to estimate the average price of gas over the whole country. The simple arithmetic mean of the 14 values given in Table 3 (Subsection 1.2) would not allow for the fact that much more gas is consumed in London, at a relatively high price, than in other cities. To take account of this you would need to calculate what is known as a weighted arithmetic mean. Weighted means are the subject of Section 2.