Epidemiology: An introduction
Epidemiology: An introduction

This free course is available to start right now. Review the full course description and key learning outcomes and create an account and enrol if you want a free statement of participation.

Free course

Epidemiology: An introduction

2.1.2 Morbidity statistics

Epidemiology also involves estimating the frequency and distribution of diseases in populations. Measures of disease frequency are tools with which to describe how common an illness is in relation to the size of the population (the population at risk). These measures count the number of cases in a population and a measure in time. The two main types of measure of disease frequency are incidence and prevalence.

  1. Incidence This is the number of new cases of a disease or disorder that arises in a defined population over a defined period of time.

Incidence rates are calculated as follows:

(Source: Royal Free Medical School, 2001, p. 83)

The specified time period is usually a calendar year. Therefore, to calculate the incidence of ovarian cancer in women in Wales in 2002 per 1,000 of the population, you would need to divide the number of new cases of ovarian cancer in women in Wales by the number of women resident in Wales in 2002 and then multiply by 1,000.

Figure 3 shows the number of newly diagnosed cases of chlamydia infection in the UK in 2002. It gives new cases of chlamydia diagnosed in genito-urinary medicine clinics in England, Wales and Scotland.

Figure 3 Map of new cases of chlamydia in 2002 (Source: Health Protection Agency, 2003, p. 28, Figure 7a)

Thinking point: What additional information would you need in order to calculate the incidence of chlamydia in a given area: for example, Scotland?

You would need the actual numbers of men and women in Scotland. If you wanted to calculate the age-specific incidence rate for men and women, because chlamydia infection is higher in younger adults, then you would need the new cases of chlamydia broken down by age as well as by sex. You would then need the numbers of men and women in different age groups in Scotland. Incidence, of course, only applies to reported and diagnosed cases of chlamydia. The actual size of the problem is likely to be higher. The same applies to prevalence rates.

  1. Prevalence This is the total number of people suffering from a specific disease at a certain point in time. Prevalence studies are commonly used to survey characteristics such as smoking habits or alcohol use.

Prevalence rates are calculated as follows:

(Source: Royal Free Medical School, 2001, p. 83)

Incidence and prevalence rates can be calculated separately for men and women. Figure 4 shows the period prevalence rates of chlamydia, gonorrhoea and genital herpes simplex for Scotland between 1992 and 2002.

Figure 4 Laboratory reports for chlamydia, gonorrhoea and genital herpes simplex for Scotland, 1992– 2002 (Source: Health Protection Agency, 2003, p. 29, Figure 7b)

These measures of morbidity, as well as those of mortality, are the raw data used in both descriptive and analytic epidemiology.

Activity 2 Plotting correlations

30 minutes

The evidence generated by most epidemiological studies is correlational which, although potentially powerful, cannot be presumed to be causal. But it does identify ‘risk factors’, and so the concept of correlation is an important one to understand. One key way of discovering whether or not there is a relationship between two variables that have been measured, is through the use of a statistic called the correlation coefficient.

Read the explanation of how correlations are calculated below. Then look at the two scatter diagrams (Figures 5 and 6) in the examples given. Which, if either, shows a high correlation?

Identifying relationships between variables: the correlation coefficient

Where there is a linear relationship between two variables there is said to be a correlation between them. Examples are height and weight in children, or socio-economic class and mortality.

Figure 5 Scatter plot showing a linear relationship between heights and weights of children.

The strength of that relationship is given by the ‘correlation coefficient’. What does it mean?

The correlation coefficient is usually denoted by the letter ‘r’: for example, r = 0.8.

A positive correlation coefficient means that as one variable is increasing the value for the other variable is also increasing – the line on the graph slopes up from left to right. Height and weight have a positive correlation: children get heavier as they grow taller.

A negative correlation coefficient means that as the value of one variable goes up the value for the other variable goes down – the graph slopes down from left to right. Higher socio-economic class is associated with a lower mortality, giving a negative correlation between the two variables.

If there is a perfect relationship between the two variables then r = 1 (if a positive correlation) or r = -1 (if a negative correlation).

If there is no correlation at all (the points on the graph are completely randomly scattered) then r = 0.

The following is a good rule of thumb when considering the size of a correlation whether positive or negative:

  • r = 0–0.2: very low and probably meaningless
  • r = 0.2–0.4: a low correlation that might warrant further investigation
  • r = 0.4–0.6: a reasonable correlation
  • = 0.6–0.8: a high correlation
  • r = 0.8–1.0: a very high correlation. Check for errors or other reasons for such a high correlation.

Example 1

A nurse wanted to be able to predict the laboratory HbA1c results (a measure of blood glucose control) from fasting blood glucose levels which she measured in her clinic. On 12 consecutive diabetic patients she noted the fasting glucose level and simultaneously drew blood for HbA1c. She compared the pairs of measurements and drew the scatter diagram in Figure 6.

Figure 6 Scatter diagram: plot of fasting glucose and HbA1c in 12 patients with diabetes.

Her results showed that r = 0.88

Example 2

An occupational therapist developed a scale for measuring physical activity and wondered how much it correlated to body mass index (BMI) in 12 of her adult patients. Figure 7 shows how they related.

Figure 7 Scatter diagram: plot of BMI and activity in 12 adult patients.

Her results showed that r = -0.34

(Adapted from Harris and Taylor, 2004, pp. 24–25)


An r of 0.88 indicates a high correlation and this is obvious if you draw a line through the dots in Example 1. In Example 2, an r of -0.34 indicates a low correlation and it is certainly not easy to see this from the diagram. You will notice that in the second example the r is negative which shows that the correlation is negative which in this case means that patients with a higher level of physical activity tend to have a lower BMI.


Take your learning further

Making the decision to study can be a big step, which is why you'll want a trusted University. The Open University has nearly 50 years’ experience delivering flexible learning and 170,000 students are studying with us right now. Take a look at all Open University courses.

If you are new to university level study, find out more about the types of qualifications we offer, including our entry level Access courses and Certificates.

Not ready for University study then browse over 900 free courses on OpenLearn and sign up to our newsletter to hear about new free courses as they are released.

Every year, thousands of students decide to study with The Open University. With over 120 qualifications, we’ve got the right course for you.

Request an Open University prospectus