1.4 The mean and median compared
Both the mean and median of a batch are useful indicators of the location of the values in the batch. They are, however, calculated in very different ways. To find the median you must first order the batch of data, and if you are not using a computer, you will often do the sorting by means of a stemplot. On the other hand, the major step in finding the mean consists of summing the values in the batch, and for this they do not need to be ordered.
For large batches, at least when you are not using a computer, it is often much quicker to sum the values in the batch than it is to order them. However, for small batches, like some of those you will be analysing in this course without a computer, it can be just as fast to calculate the median as it is to calculate the mean. Moreover, placing the batch values in order is not done solely to help calculate the median – there are many other uses. Drawing a stemplot to order the values also enables us to examine the general shape of the batch. In Section 3 you will read about some other uses of the stemplot.
Comparisons based on the method of calculation can be of great practical interest, but the rest of this subsection will consider more fundamental differences between the mean and the median – differences which should influence you when you are deciding which measure to use in summarising the general location of the values in a batch.
Many of the problems with the mean, as well as some advantages, lie in the fact that the precise value of every item in the batch enters into its calculation. In calculating the median, most of the data values come into the calculation only in terms of whether they are in the 50% above the median value or the 50% below it. If one of them changes slightly, but without moving into the other half of the batch, the median will not change. In particular, if the extreme values in the batch are made smaller or larger, this will have no effect on the value of the median – the median is resistant to outliers. In contrast, changes to the extremes could have an appreciable effect on the value of the mean, as the following examples show.
Example 5 Changing the extreme coffee prices
For the batch of coffee prices in Figure 1 (Subsection 1.2), the sum of the values is 4363p, so the mean is
Suppose the highest and lowest coffee prices are reduced so that
The median of this altered batch is the same as before, 295p. However, the sum of the values is now 4306p and so the mean is
Example 6 Changing the small television prices
Suppose the highest two television prices in Activity 1 (Subsection 1.2) are altered to £350 and £400. The median, at £150, remains the same as that of the original batch, whereas the new mean is
compared with the original mean of £162.
Now, even with the very high prices of £350 and £400 for two televisions, the overall location of the main body of the data is still much the same as for the original batch of data. For the original batch the mean, £162, was a reasonably good measure of this. However, for the new batch the mean, £174, is much too high to be a representative measure since, as we can see from the stemplot in Activity 1, most of the values are below £174.
Example 6 is the subject of the following screencast. [Note that the reference to ‘Unit 2’ should be ‘this course’. Unit 2 is a reference to the Open University course from which this material is adapted.]
Transcript: Screencast 1 Effects on the median and mean when data points change
A measure which is insensitive to changes in the values near the extremes is called a resistant measure.
The median is a resistant measure whereas the mean is sensitive.
In the following activities, you can investigate some other ways in which the median is more resistant than the mean.
Activity 4 Changing the gas prices
In Activity 2 (Subsection 1.2) you may have noticed that Cardiff and Ipswich had rather low gas prices compared to the other southern cities. Here you are going to examine the effect of deleting them from the batch of southern cities. Complete the following table and comment on your results.
Batch  Mean  Median 

Seven southern cities 


Five southern cities (excluding Cardiff and Ipswich) 


Discussion
The completed table is:
Batch  Mean  Median 

Seven southern cities 
3.7859 
3.795 
Five southern cities (excluding Cardiff and Ipswich) 
3.7996 
3.796 
Whereas deletion of Cardiff and Ipswich has the effect of increasing the mean price by 0.0137p per kWh, the median price increases by only 0.001p per kWh. This is what we would expect as, in general, the more resistant a measure is, the less it changes when a few extreme values are deleted.
Activity 5 A misprint in the gas prices
Suppose the value for London had been misprinted as 8.318 instead of 3.818 (quite an easy mistake to make!). How would this affect your results for the batch of five southern cities (again omitting Cardiff and Ipswich)?
Batch  Mean  Median 

Five cities (correct data) 


Five cities (with misprint) 


Discussion
The completed table is:
Batch  Mean  Median 

Five cities (correct data) 
3.7996 
3.796 
Five cities (with misprint) 
4.6996 
3.796 
Here the median is completely unaffected by the misprint, although the mean changes considerably.
Suppose you wanted to use these values – the correct ones, of course – to estimate the average price of gas over the whole country. The simple arithmetic mean of the 14 values given in Table 3 (Subsection 1.2) would not allow for the fact that much more gas is consumed in London, at a relatively high price, than in other cities. To take account of this you would need to calculate what is known as a weighted arithmetic mean. Weighted means are the subject of Section 2.