2.1.2 Morbidity statistics
Epidemiology also involves estimating the frequency and distribution of diseases in populations. Measures of disease frequency are tools with which to describe how common an illness is in relation to the size of the population (the population at risk). These measures count the number of cases in a population and a measure in time. The two main types of measure of disease frequency are incidence and prevalence.
- Incidence This is the number of new cases of a disease or disorder that arises in a defined population over a defined period of time.
Incidence rates are calculated as follows:
The specified time period is usually a calendar year. Therefore, to calculate the incidence of ovarian cancer in women in Wales in 2002 per 1,000 of the population, you would need to divide the number of new cases of ovarian cancer in women in Wales by the number of women resident in Wales in 2002 and then multiply by 1,000.
Figure 3 shows the number of newly diagnosed cases of chlamydia infection in the UK in 2002. It gives new cases of chlamydia diagnosed in genito-urinary medicine clinics in England, Wales and Scotland.
Thinking point: What additional information would you need in order to calculate the incidence of chlamydia in a given area: for example, Scotland?
You would need the actual numbers of men and women in Scotland. If you wanted to calculate the age-specific incidence rate for men and women, because chlamydia infection is higher in younger adults, then you would need the new cases of chlamydia broken down by age as well as by sex. You would then need the numbers of men and women in different age groups in Scotland. Incidence, of course, only applies to reported and diagnosed cases of chlamydia. The actual size of the problem is likely to be higher. The same applies to prevalence rates.
- Prevalence This is the total number of people suffering from a specific disease at a certain point in time. Prevalence studies are commonly used to survey characteristics such as smoking habits or alcohol use.
Prevalence rates are calculated as follows:
Incidence and prevalence rates can be calculated separately for men and women. Figure 4 shows the period prevalence rates of chlamydia, gonorrhoea and genital herpes simplex for Scotland between 1992 and 2002.
These measures of morbidity, as well as those of mortality, are the raw data used in both descriptive and analytic epidemiology.
Activity 2 Plotting correlations
The evidence generated by most epidemiological studies is correlational which, although potentially powerful, cannot be presumed to be causal. But it does identify ‘risk factors’, and so the concept of correlation is an important one to understand. One key way of discovering whether or not there is a relationship between two variables that have been measured, is through the use of a statistic called the correlation coefficient.
Read the explanation of how correlations are calculated below. Then look at the two scatter diagrams (Figures 5 and 6) in the examples given. Which, if either, shows a high correlation?
Identifying relationships between variables: the correlation coefficient
Where there is a linear relationship between two variables there is said to be a correlation between them. Examples are height and weight in children, or socio-economic class and mortality.
The strength of that relationship is given by the ‘correlation coefficient’. What does it mean?
The correlation coefficient is usually denoted by the letter ‘r’: for example, r = 0.8.
A positive correlation coefficient means that as one variable is increasing the value for the other variable is also increasing – the line on the graph slopes up from left to right. Height and weight have a positive correlation: children get heavier as they grow taller.
A negative correlation coefficient means that as the value of one variable goes up the value for the other variable goes down – the graph slopes down from left to right. Higher socio-economic class is associated with a lower mortality, giving a negative correlation between the two variables.
If there is a perfect relationship between the two variables then r = 1 (if a positive correlation) or r = -1 (if a negative correlation).
If there is no correlation at all (the points on the graph are completely randomly scattered) then r = 0.
The following is a good rule of thumb when considering the size of a correlation whether positive or negative:
- r = 0–0.2: very low and probably meaningless
- r = 0.2–0.4: a low correlation that might warrant further investigation
- r = 0.4–0.6: a reasonable correlation
- = 0.6–0.8: a high correlation
- r = 0.8–1.0: a very high correlation. Check for errors or other reasons for such a high correlation.
Example 1
A nurse wanted to be able to predict the laboratory HbA1c results (a measure of blood glucose control) from fasting blood glucose levels which she measured in her clinic. On 12 consecutive diabetic patients she noted the fasting glucose level and simultaneously drew blood for HbA1c. She compared the pairs of measurements and drew the scatter diagram in Figure 6.
Her results showed that r = 0.88
Example 2
An occupational therapist developed a scale for measuring physical activity and wondered how much it correlated to body mass index (BMI) in 12 of her adult patients. Figure 7 shows how they related.
Her results showed that r = -0.34
(Adapted from Harris and Taylor, 2004, pp. 24–25)
Comment
An r of 0.88 indicates a high correlation and this is obvious if you draw a line through the dots in Example 1. In Example 2, an r of -0.34 indicates a low correlation and it is certainly not easy to see this from the diagram. You will notice that in the second example the r is negative which shows that the correlation is negative which in this case means that patients with a higher level of physical activity tend to have a lower BMI.