Course content Course content

Prices, location and spread

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

More free courses

3.3 The five-figure summary and boxplots

As well as giving us a new measure of spread – the interquartile range – the quartiles are important figures in themselves. Our -shaped diagram, Figure 19, gives five important points which help to summarise the shape of a distribution: the median, the two quartiles and the two extremes.

Figure 19 Values in a five-figure summary

Show description|Hide description

Values in a five-figure summary. There are four lines, forming the shape of a letter M with sloping sides. At the base of the line on the left is capital E subscript capital L, indicating the lower extreme, the lowest value in the data. At the top of that line is capital Q subscript 1, indicating the lower quartile. The line then slopes down and to the right. At the end of the second line is capital M, indicating the median. This is on the same horizontal level as capital E subscript capital L. The line then rises and slopes to the right. It ends at capital Q subscript 3, indicating the upper quartile, which is at the same horizontal level as capital Q subscript 1. The line then falls, sloping to the right. At the end of the line is capital E subscript capital U, indicating the upper extreme, the highest value in the data. This is at the same horizontal level as capital M.

Values in a five-figure summary

These are conveniently displayed in the following form, called the five-figure summary of the batch.

Five-figure summary

Figure 20

Show description|Hide description

A five-figure summary which is a diagrammatic representation showing the batch size, n, the median capital M, the lower quartile capital Q subscript 1, the upper quartile capital Q subscript 3, the lower extreme capital E subscript capital L, and the upper extreme capital E subscript capital U. The diagram forms three sides of a rectangle, with the bottom line missing. It therefore has a vertical line to the left, a horizontal line across the top and a vertical line to the right. To the left of the left vertical line is written n. Towards the bottom of the line and to its right is written capital E subscript capital L and capital Q subscript 1, with capital Q subscript 1 being above capital E subscript capital L. Beneath the middle of the horizontal line is written capital M. To the left of the second vertical line but level with capital Q subscript 1 is written capital Q subscript 3. Below that, and level with capital E subscript capital L, is written capital E subscript capital U.

Example 18 Five-figure summary for television price data

For the television price data, we have , , , , and . (You last saw these data in Figure 16, Subsection 3.2.)

Therefore, the five-figure summary of this batch is

Figure 21

Show description|Hide description

A five-figure summary. The diagram forms three sides of a rectangle, with the bottom line missing. It therefore has a vertical line to the left, a horizontal line across the top and a vertical line to the right. To the left of the left vertical line is written n = 20. Towards the bottom of the left vertical line and to its right is written 90 and above that 130. Beneath the middle of the horizontal line is written 150. To the left of the second vertical line and level with 130 is written 180. Below that and level with 90 is written 270.

This diagram contains the following information about the batch of prices.

The general level of prices, as measured by the median, is £150.
The individual prices vary from £90 to £270.
About 25% of the prices were less than £130.
About 25% of the prices were more than £180.
About 50% of the prices were between £130 and £180.

We hope you agree that the five-figure summary is quite an efficient way of presenting a summary of a batch of data.

The five values in a five-figure summary can be very effectively presented in a special diagram called a boxplot. For the 14 gas prices (Figure 15, Subsection 3.2) the diagram looks like Figure 22.

Figure 22 Boxplot of batch of 14 gas prices

Show description|Hide description

A boxplot with a horizontal scale with an arrow head, pointing right, at the right-hand end. It is labelled pence per kilowatt hour. The scale starts just before the first marked point of 3.74 and is then marked in four intervals of 0.02, ending with 3.82. The boxplot is drawn above and parallel to the line. The first whisker starts at the lower extreme, level with 3.74.The box starts at the lower quartile, 3.756 and ends at the upper quartile, 3.802. Within the box but nearer the right-hand end is a vertical line indicating the median at 3.790. The second whisker starts at the midpoint of the right-hand end of the box and stretches to 3.818, as far as the upper extreme.

Boxplot of batch of 14 gas prices

The central feature of this diagram is a box – hence the name boxplot. The box extends from the lower quartile (at the left-hand edge of the box) to the upper quartile (the right-hand edge). This part of the diagram contains 50% of the values in the batch. The length of this box is thus the interquartile range.

Outside the box are two whiskers. (Boxplots are sometimes called box-and-whisker diagrams.) In many cases, such as in Figure 22, the whiskers extend all the way out to the extremes. Each whisker then covers the end 25% of the batch and the distance between the two whisker-ends is then the range. (You will see examples later where the whiskers do not go right out to the extremes.)

So far we have dealt with four figures from the five-figure summary: the two quartiles and the two extremes. The remaining figure is perhaps the most important: it is the median, whose position is shown by putting a vertical line through the box.

Thus a boxplot shows clearly the division of the data into four parts: the two whiskers and the two sections of the box; these are the four parts of the -shaped diagram and each contains (approximately) 25% of values in the batch (see Figure 21).

John W. Tukey (1915–2000), inventor of the five-figure summary and boxplot

John Tukey was a prominent and prolific US statistician, based at Princeton University and Bell Laboratories. As well as working in some very technical areas, he was a great promoter of simple ways of picturing and summarising data, and invented both the five-figure summary and the boxplot (except that he called them the ‘five-number summary’ and the ‘box-and-whisker plot’).

He had what has been described as an ‘unusual’ lecturing style. The statistician Peter McCullagh describes a lecture he gave at Imperial College, London in 1977:

Tukey ambled to the podium, a great bear of a man dressed in baggy pants and a black knitted shirt. These might once have been a matching pair, but the vintage was such that it was hard to tell. …The words came …, not many, like overweight parcels, delivered at a slow unfaltering pace. …Tukey turned to face the audience …. ‘Comments, queries, suggestions?’ he asked …. As he waited for a response, he clambered onto the podium and manoeuvred until he was sitting cross-legged facing the audience. …We in the audience sat like spectators at the zoo waiting for the great bear to move or say something. But the great bear appeared to be doing the same thing, and the feeling was not comfortable. …After a long while, …he extracted from his pocket a bag of dried prunes and proceeded to eat them in silence, one by one. The war of nerves continued …four prunes, five prunes. …How many prunes would it take to end the silence?

(Source: McCullagh, P. (2003) ‘John Wilder Tukey’, Biographical Memoirs of Fellows of the Royal Society, vol. 49, pp. 537–55.)

Figure 23 A standard boxplot with annotation

Show description|Hide description

A boxplot which consists of a line, a rectangle, known as a box, and a second line. The first line, known as a whisker, starts at capital E subscript capital L, the lower extreme and leads to the midpoint of the left side of the box. Immediately above and at the corner of the box is written capital Q subscript 1. This point is the lower quartile. The box stretches for some way to the right, ending at the upper quartile, with capital Q subscript 3 written above the end of the box and on the same horizontal level as capital Q subscript 1. A second horizontal line, also known as a whisker, starts at the midpoint of the right-hand end of the box and stretches as far as capital E subscript capital U, the upper extreme. The median is marked by capital M above the box at the appropriate position and by a vertical line within the box. Beneath the boxplot are 4 horizontal curly brackets, each of which has 25% written under it. The first stretches from immediately beneath capital E subscript capital L to the start of the box. The second bracket stretches from the start of the box to level with the vertical line level indicating the position of the median. The third stretches from the line level with the median to level with capital Q subscript 3 and the last stretches from level with capital Q subscript 3 to level with capital E subscript capital U.

A standard boxplot with annotation

A typical boxplot looks something like Figure 23 because in most batches of data the values are more densely packed in the middle of the batch and are less densely packed in the extremes. This means that each whisker is usually longer than half the length of the box. This is illustrated again in the next example.

Example 19 Boxplot for the prices of small televisions

The boxplot for the batch of 20 television prices (last worked with in Example 18) is shown in Figure 24.

Figure 24 Boxplot of batch of 20 television prices

Show description|Hide description

The horizontal scale is marked from just before 100 to 275 in intervals of 25 units. The scale is labelled pounds sterling. The boxplot shows that the lower extreme is less than 100. The whisker leads to the lower quartile, at the start of the box. This occurs just past 125. The box contains a vertical line, indicating the position of the median. This occurs at 150. The box ends at the upper quartile, which occurs just after 175. The right-hand whisker ends at 250. There is then a gap. Farther on, but at the same level as the whisker, is an asterisk.

Boxplot of batch of 20 television prices

You can see that each whisker is longer than half the length of the box.

However, this boxplot has a new feature. The whisker on the left goes right down to the lower extreme. But the whisker on the right does not go right to the upper extreme. The highest extreme data value, 270, which might potentially be regarded as an outlier, is marked separately with a star. Then the whisker extends only to cover the data values that are not extreme enough to be regarded as potential outliers. The highest of these values is 250.

(This course does not describe the rule to decide which data values (if any) can be regarded as potential outliers that are plotted separately on the diagram. This is another issue that may be dealt with differently by different authors and different software.)

Example 19 is the subject of the following screencast. [Note that the reference to ‘Unit 2’ should be ‘this course’ and ‘Figure 18’ should be ‘Figure 23’. Unit 2 and Figure 18 are references to the Open University course from which this material is adapted.]

Download this video clip.Video player: Interpreting a boxplot

Show transcript|Hide transcript

Transcript: Screencast 4 Interpreting a boxplot

INSTRUCTOR

In this screencast, I’m going to talk about interpreting a boxplot. And what we have here is an example of a boxplot. And it happens to be Figure 18 from Subsection 3.3 of Unit 2. And it’s a boxplot of the small television prices. And you can see here that television prices are given in pounds, and they go from just under £100 up to £275.

The first thing on the boxplot to look at is the box itself – in particular, to look at the ends of the box. The end on the left hand side shows us where the lower quartile is. And here is about £130. The end on the right hand side shows us where the upper quartile is – Q3. And this translates to about £180. So the lower quartile is about £130, and the upper quartile is about £180.

The line in the middle of the box shows us where the median is. And this is about £150. So the price of the median small television is £150. Notice in this example, it is quite clear where the line is in the middle of the box. There are some examples where the median is the same as the lower quartile or where the median is the same as the upper quartile. And then you won’t actually see a line in the box. The line will be at one end or the other.

The other thing to notice on the boxplot are the two whiskers. So there’s a whisker on the right hand side and a whisker on the left hand side. The whisker on the right hand side shows us where the values that are high but not too high are. Similarly, on the left hand side, the whisker on the left hand side shows us where the values are low but not too low are.

And finally, notice there’s one point here marked all by itself. And this is a value that we wonder whether it’s too high. In other words, we’re marking this one out as a potential outlier. This shows all the elements that are on a boxplot. And one thing we can use these elements for is to say something about the symmetry of the data.

And one thing we can look at is where the median is relative to the two ends of the box. So here we notice that the left hand side is short relative to the right hand side. We can look at the whiskers in the same way, and notice that the whisker on the left hand side is relatively short. And the whisker on the right hand side is relatively long.

And both these observations together suggest that the data are right-skew. The data tends to be more spread out on the right hand side of the median relative to the data on the left hand side the median. And notice, in doing this, we haven’t actually taken account of the outlier. If we took the outlier into account as well, this would only emphasise more that the data are right-skew. Because this adds to the impression that the data are more spread out to the right of the median relative to the left of the median.

End transcript: Screencast 4 Interpreting a boxplot

Download

Screencast 4 Interpreting a boxplot

Interactive feature not available in single page view (see it in standard view).

One important use of boxplots is to picture and describe the overall shape of a batch of data.

Example 20 Skew televisions

The stemplot of small television prices, last seen in Figure 16 (Subsection 3.2), shows a lack of symmetry. Since the higher values are more spread out than the lower values, the data are right-skew.

The boxplot of these data, given in Figure 22, also shows this right-skew fairly clearly. In the box, the right-hand part (corresponding to higher prices) is rather longer than the left-hand part, and the right-hand whisker is longer than the left-hand whisker.

Activity 13 Skew gas prices?

A stemplot of the gas price data from Activity 2 (Subsection 1.2) is shown, yet again, in Figure 25.

Figure 25 Stemplot of 14 gas prices

Show description|Hide description

A stemplot with 8 levels, which start at 374 and end at 381. Level 374 has three leaves, 0, 0, 3. Level 375 has no leaves. Level 376 has two leaves, 0, 7. Level 377 has one leaf, 6. Level 378 also has one leaf, 4. Level 379 has two leaves, 5, 6. Level 380 has four leaves, 1, 1, 4, 5. Level 381 has one leaf, 8. Beneath the stemplot is written n = 14, followed by 374 vertical line 0 represents 3.740 pence per kilowatt hour. The vertical line lies horizontally between the 374 and the 0.

Stemplot of 14 gas prices

(a) Prepare a five-figure summary of the batch.

Discussion

All the necessary figures have already been calculated. You found the median (3.790) in Activity 2 and the quartiles (, ) in Activity 10. The extremes (, ) and the batch size () are clearly shown in the stemplot.

So the five-figure summary is as follows:

Figure 26

Show description|Hide description

A five-figure summary. The diagram forms three sides of a rectangle, with the bottom line missing. It therefore has a vertical line to the left, a horizontal line across the top and a vertical line to the right. To the left of the left vertical line is written n = 14. Towards the bottom of the line and to the right is written 3.740 and above that 3.756. Beneath the middle of the horizontal line is written 3.790. To the left of the second vertical line and level with 3.756 is written 3.802. Below that, level with 3.740, is written 3.818.

(b) Figure 27 shows the boxplot of these data that you have already seen in Figure 22. What do the stemplot and boxplot tell us about the symmetry and/or skewness of the batch?

Figure 27 Boxplot of batch of 14 gas prices

Show description|Hide description

Boxplot of batch of 14 gas prices, previously shown as Figure 17. There is a horizontal scale with an arrow head, pointing right, at the right-hand end. It is labelled pence per kilowatt hour. The scale starts just before the first marked point of 3.74 and is then marked in four intervals of 0.02, ending with 3.82. The boxplot is drawn above and parallel to the line. The first whisker starts at the lower extreme, level with 3.74.The box starts at the lower quartile, 3.756 and ends at the upper quartile, 3.802. Within the box but nearer the right-hand end is a vertical line indicating the median at 3.790. The second whisker starts at the midpoint of the right-hand end of the box and stretches to 3.818, as far as the upper extreme.

Boxplot of batch of 14 gas prices

Discussion

Looking at the stemplot, on the whole the lower values are more spread out, indicating that the data are not symmetric and are left-skew.

The central box of the boxplot again shows left skewness, with the left-hand part of the box being clearly longer than the right-hand part. However, this skewness does not show up in the lengths of the whiskers in this batch – they are both the same length.

Example 21 Camera prices: skew or not?

In Example 20 and Activity 13 you saw how boxplots look for batches of data that are right-skew or left-skew. What happens in a batch that is more symmetrical?

For the small batch of camera prices from Table 2 (Subsection 1.2), a (stretched) stemplot is shown in Figure 28.

Figure 28 Stemplot of ten camera prices

Show description|Hide description

A stemplot with 9 levels, which are 5, 5, 6, 6, 7, 7, 8, 8, 9. The first level 5 has one leaf, 3. The second level 5 has no leaves. The first level 6 has one leaf, 0. The second level 6 has one leaf, 5. The first level 7 has three leaves, 0, 0, 4. The second level 7 has one leaf, 9. The first level 8 has one leaf, 1 and the second level 8 also has one leaf, 5. Level 9 has one leaf, 0. Beneath the stemplot is written n = 10, followed by 5 vertical line 3 represents 53 pounds sterling. The vertical line sits horizontally between the 5 and the 3.

Stemplot of ten camera prices

The stemplot looks reasonably symmetric.

A boxplot of the data, Figure 29, confirms the impression of symmetry. The two parts of the box are roughly equal in length, and the two whiskers are also roughly equal in length.

Figure 29 Boxplot of batch of ten camera prices

Show description|Hide description

A boxplot with a horizontal scale with an arrow head, pointing right, at the right-hand end. It is labelled pounds sterling. The scale starts just before the first marked point of 50 and then marked in five intervals of 10, ending with 90. The boxplot is drawn above and parallel to the line. The first whisker starts with capital E subscript capital L at 53 and ends at capital Q subscript 1, the lower quartile, 65. The box ends at capital Q subscript 3, the upper quartile, 81. Near the centre of the box is a vertical line indicating the median at 72. The second whisker starts at the midpoint of the right-hand end of the box and stretches to capital E subscript capital U at 90.

Boxplot of batch of ten camera prices

You have now spent quite a lot of time looking at various ways of investigating prices and, in particular, at methods of measuring the location and spread of the prices of particular commodities.

In order to begin to answer our question, Are people getting better or worse off?, we need to know not just location (and spread) of prices but also how these prices are changing from year to year. That is the subject of the rest of this course.

Previous A measure of spread

Next Exercises on Section 3

Take your learning further

Making the decision to study can be a big step, which is why you’ll want a trusted University. We’ve pioneered distance learning for over 50 years, bringing university to you wherever you are so you can fit study around your life. Take a look at all Open University courses.

If you’re new to university-level study, read our guide on Where to take your learning next, or find out more about the types of qualifications we offer including entry level Access modules, Certificates, and Short Courses.

Want to achieve your ambition? Study with us and you’ll be joining over 2 million students who’ve achieved their career and personal goals with The Open University.

Browse all Open University courses

My OpenLearn Profile

About this free course

Become an OU student

Download this course

Share this free course

3.3 The five-figure summary and boxplots

Five-figure summary

Example 18 Five-figure summary for television price data

John W. Tukey (1915–2000), inventor of the five-figure summary and boxplot

Example 19 Boxplot for the prices of small televisions

Transcript: Screencast 4 Interpreting a boxplot

Example 20 Skew televisions

Activity 13 Skew gas prices?

Discussion

Discussion

Example 21 Camera prices: skew or not?