5.6 Quartiles and the interquartile range
The first alternative measure of dispersion we shall discuss is the interquartile range: this is the difference between summary measures known as the lower and upper quartiles. The quartiles are simple in concept: if the median is regarded as the middle data point, so that it splits the data in half, the quartiles similarly split the data into quarters. This is, of course, an over-simplification. With an even number of data points, the median is defined to be the average of the middle two: defining quartiles is a little more complicated.
It would be convenient to express our wordy definition of the median in a concise symbolic form, and this is easy to do. Any data sample of size n may be written as a list of numbers
x 1,x 2,x 3, ... x n
In order to calculate the sample median it is necessary to sort the data so that they are written in order of increasing size. The sorted list can then be written as
x (1),x (2),x (3), ... x (n)
where x (1) is the smallest value in the original list (the minimum) and x (n) is the largest (the maximum). In general, the notationx (p) is used to mean the pth value when the data are arranged in order of increasing size. Each successive item in the ordered list is greater than or equal to the previous item. For instance, the list of six data items
7, 1, 3, 6, 3, 7
may be ordered as
1, 3, 3, 6, 7, 7
So, for these data, x (1)=1, x (2)=x (3)=3, x (4)=6, x (5)=x (6)=7.
In any such ordered list, the sample median m may be defined to be the number
m = x (½(n + 1))
as long as the subscript on the right-hand side is appropriately interpreted.
If the sample size n is odd, then the number ½(n+1) is an integer, and there is no problem of definition. For instance, if n=27 then ½(n+1)=14, and the sample median is m = x (14) side of it.
If the sample size n is even then the number ½(n+1) is not an integer but has a fractional part equal to ½. For instance, if n = 6 (as in the example above) then the sample median is
m = x (½(n+1)) = x (3½).
Such numbers are sometimes called ‘half-integer’
If the number x (3½) is interpreted as ‘the number halfway between x (3) and x (4)’ then you can see that the wordy definition survives intact. This obvious interpretation of numbers such as x 3½ can be extended to numbers such as x (2¼) and x (4¾): x (2¼) is the number one-quarter of the way from x (2) to x (3), and x (4¾) is the number three-quarters of the way from x (4) to x (5). Interpreting fractional subscripts in this way when they occur, the lower quartile (roughly, one-quarter of the way into the data set) and the upper quartile (approximately three-quarters of the way through the data set) may be defined as follows.
If a data set x 1, x 2,… , x n is reordered as x (1), x (2), …, x (n) , where
x (1) ≤ x (2) ≤ ... ≤ x(n)
then the lower sample quartile qL is defined by
qL = x (¼(n+1))
and the upper sample quartile qU is defined by
qU = x (¾(n+1))
Unfortunately, there is no universally accepted definition for sample quartiles, nor, indeed, a universally accepted nomenclature. The lower and upper sample quartiles are sometimes called the first and third sample quartiles. The median is the second sample quartile. Other definitions are possible, and you may even be familiar with some of them. For instance, some practitioners use
qL = x (¼n+½), qU = x (¾n+½)
qL = x (¼n+¾), qU = x (¾n+¼)
Still others insist that the lower and upper quartiles be defined in such a way that they are identified uniquely with actual sample items. However, almost all definitions of the sample median reduce to the same thing.