2.1.3 Box-and-whisker plot

Another way of displaying information about the spread of data is the box-and-whisker plot (also referred to as the boxplot). Boxplots are graphs that summarise the five-number summary of continuous or discrete data. They consist of:

  • a central box that spans the quartiles Q1 (lower quartile, 25th percentile) and Q3 (upper quartile, 75th percentile) – (note that the range from Q1 to Q3 is known as the interquartile range (IQR))
  • a line in the box that marks the median (50th percentile)
  • lines (whiskers) that extend from the box out to either the smallest (minimum) and largest (maximum) observations, excluding outliers, or 1.5 times the interquartile range on each side, also excluding outliers (if it is ambiguous which method is used, it is normally mentioned in the Figure’s legend)
  • outliers, which are data values that are far away from other data values. On a boxplot, outliers are identified by a symbol such as a dot or an asterisk.

Boxplots are most beneficial when used for a side-by-side comparison of more than one distribution.

Activity 5: Understanding box-and-whisker plots

Timing: Allow about 15 minutes

Look at the box plot in Figure 5. Can you annotate one of the boxplots to show the five-number summary for a distribution?

Described image
Figure 5 Biofilm formation by Stenotrophomonas maltophilia isolated from patient samples according to patient and sample types. Biofilm biomass, assessed by spectrophotometric assay after crystal violet assay, was stratified according to patients with or without cystic fibrosis (CF, non-CF) and (B) sample type. (Significance level from Mann-Whitney test: * p

Answer

Described image
Figure 6 Annotated Figure 5

The box extends from the first (Q1) and third (Q3) quartiles. The line in the middle of the box is plotted at the median, while the ends of the whiskers represent the minima and the maxima of all the data. The lines extending from the interquartile range are called whiskers. The whiskers extend to the maximum and minimum values in the dataset, excluding outliers. Note, the asterisks in Figure 5 and Figure 6 are not outliers, rather they are indicating the statistical significance of the results.

The strengths and limitations of boxplots are listed below:

Table 9 Strengths and limitations of boxplots
StrengthsLimitations
Summarise large datasetsCannot read exact values
Summarises the distribution of the data, the symmetry and skewnessEmphasises the tails of the distribution, which are the least certain points in a data set
Shows outliers, unlike many other graphsDoesn’t show many details of the distribution, so need to use in combination with a histogram
Compare the distribution of other data sets
Important tool for exploratory data analysis

2.1.4 Scatter plots