Become an OU student

Interpreting data: Boxplots and tables

Start this free course now. Just create an account and sign in. Enrol and complete the course for a free statement of participation or digital badge if available.

2.4 Including the results of useful calculation

Can Table 2.4 be simplified further by pooling more rows or columns? Perhaps it might be, but there may well be a risk of losing some important or relevant information. So, before considering any further simplification, we shall look at adding information to the table, in the form of the results of some helpful calculations (guideline 4).

On their own, some of the numbers in the table still do not mean a great deal. There were 61 new cases among males in the 55–59 age group. But how does this compare with males in other age groups, and with females? There were 60 new cases for males aged 70–74. On the face of it this looks very close to the figure for the 55–59 group. But there were far more males in the South Australian population aged 55–59 than there were aged 70–74 (35192 compared to 16613). It seems likely that the main interest in these data is in the varying chances of developing lung cancer or dying from it, at different ages and for the two genders. To find out something about this, it is useful to calculate the proportions of the different age groups that became new cases of lung cancer. For males aged 55–59, the proportion is 61/35192=0.0017333, or 0.17333% as a percentage. For males aged 70–74 the corresponding proportion is 60/16613=0.0036116, or 0.36116%. It is very common, and often very useful, to calculate such quantities, which are often known as rates.

For the time being, we shall just look at the new cases and omit the information on deaths. The rate for new cases in each age group has been calculated for males and for females; these rates are included in Table 2.5. As you can see, these numbers do not look particularly user-friendly!

Table 2.5 South Australia: incidence for lung cancer, 1981
Age groupPopulation sizeNew casesNew cases as % of population size
MaleFemaleMaleFemaleMaleFemale
0–39427725414937120.00233800.0048200
40–443564835547250.0561040.014066
45–493291131799820.0243080.062895
50–5436485353333880.104150.022642
55–59351923555561180.173330.050626
60–64281313086867160.238170.051834
65–69244192739088150.360380.054765
70–74166132140260210.361160.098122
75–7999581454646100.461940.068747
80–84485297492460.494640.061545
85+27907477720.250900.026749

The table still looks pretty horrible and the information it contains is difficult to assimilate, largely because there is too much clutter from information of dubious relevance, and also because far too many decimal places are included in the last two columns. The latter problem is easily solved, in accord with guideline 3. First, note that (for example) the figure of 0.098122% for females aged 70–74 means that, for every 100 women in this age group (in South Australia in 1981), there were 0.098122 new cases of lung cancer. In this context there is nothing special about calculating the rate per 100 women in the population. Instead, the number of cases per 100 000 women in the population will be calculated. This has the effect of multiplying all the rates by 1000, which gets rid of most of the occurrences of ‘0.0…’ at the start of the numbers, and hence makes the table easier to read. Also, simply to get across the main message of these data does not require five significant figures. Instead, in Table 2.6, the figures are given to one decimal place.

Table 2.6 South Australia: incidence for lung cancer, 1981
Age groupPopulation sizeNew casesNewcases per 100 000 population
MaleFemaleMaleFemaleMaleFemale
0–39427725414937120.20.5
40–443564835547255.614.1
45–4932911317998224.36.3
50–543648535333388104.222.6
55–5935192355556118173.350.6
60–6428131308686716238.251.8
65–6924419273908815360.454.8
70–7416613214026021361.298.1
75–799958145464610461.968.7
80–8448529749246494.661.5
85+2790747772250.926.7

Now does it make sense to simplify the table any further? If we want to use it to communicate information about the relative chances of being diagnosed as a new case of lung cancer at different ages and for the two genders, the ‘Population size’ and ‘New cases’ columns do not actually give very relevant information. It might therefore be reasonable to omit them. Furthermore, the general pattern of the new case rates at different ages can be communicated with rather fewer age groups than were used in Table 2.6. Table 2.7 uses fewer and coarser age groupings, and the only figures given are the calculated values of the new cases per 100 000 and deaths per 100 000; these have been rounded to one decimal place. (Note that the figures for new cases in Table 2.7 cannot be calculated simply from the rates given in the last two columns of Table 2.6. The appropriate population sizes and counts of cases must be aggregated and the aggregates used to calculate the rates.)

Table 2.7 South Australia: incidence and mortality for lung cancer, 1981 (rates per 100,000 population)
Age groupNew casesDeaths
MaleFemaleMaleFemale
0–492.21.93.01.0
50–59138.136.796.322.6
60–69295.053.2239.854.9
70–79398.986.2402.783.5
80+405.746.4405.740.6

(Whole numbers in the deaths column would arguably have been quite adequate to get across the message of these data. Using one decimal place has the advantage of making it clear that these are rates, and not counts of individual cases.)

This is a quickly assimilated table that communicates the pattern of incidence and death from lung cancer, in relation to population size. It is easy to compare the figures for males and females, and it is equally easy to compare incidence with mortality in any of the age groups.

Activity 4 Describing data in a table

• (a) Describe the main patterns in the data on lung cancer in South Australia, on the basis of Table 2.7.

• (b) Table 2.7 is certainly much simpler than the earlier tables in this section, and you would probably agree that the patterns in the data are easier to see. But can you think of any disadvantages of the presentation in Table 2.7 compared to the other tables?