2.1.2 Histograms

Histograms show the distribution of values for a quantitative (continuous) variable. Histograms are useful when there are many observations and you want to understand the overall shape and spread of your data.

The x-axis is marked in the units of measurement for the independent variable (e.g. age, time, MIC, zone diameter). The y-axis is the scale that shows you the number of times (frequency, proportion) the value in an interval occurred.

To create a histogram, you need to first group data from the independent variable into class intervals (bins) of equal width and then count the values in each interval (class frequency). Class frequencies are represented by bars on a histogram. The height of each bar corresponds to its class frequency. An example of how to construct a histogram from raw data is shown below:

Example

In this example, a series of observations have been recorded in a variable called ‘Age’. To construct the histogram, ‘Age’ is split into five class intervals. Each interval contains the count of occurrences.

Table 6 Variable name: Age
39412238465565788318
28545361101629585566
Table 7 Frequency of the variable: Age
Class intervalClass frequencyObservations
0–20310, 16, 18
20.1–40522, 28, 29, 38, 39
40.1–60741, 46, 53, 54, 55, 55, 58
60.1–80461, 65, 66, 78
80.1–100183

The histogram is then constructed based on the number of class intervals which are plotted on the x-axis with the y-axis showing the frequency (number) of occurrences in each class interval.

Described image
Figure 3 Example histogram of the age groups of people

Traditionally, a histogram is drawn with no space between classes to indicate all values of the variable are represented. (This is different from a bar chart, which has space between the classes.) However, sometimes histograms may be drawn with spaces between the classes for greater visual impact. You can see this in Figure 4.

Described image
Figure 4 Histogram showing the distribution of zone sizes obtained for oxolinic acid against Aeromonas salmoncida (n=323) (grey bars) and the breakpoints currently being used (black bars) (Smith, 2008)

The purpose of the histogram (or, indeed, of any graph) is to help understand the data. When viewing a histogram, look for important features, including the shape and spread of the data and whether there are any deviations (outliers). Outliers are data points that lie a long way from the general pattern in the data.

A histogram can have different shapes: it can be unimodal (a single peak representing the interval with the most values e.g. a normal distribution), bimodal (two peaks) or multimodal (more than two peaks). A histogram can also be symmetrical (when the right and left sides of the midpoint are similar) or skewed, where the intervals are grouped to the right (positively skewed) or left side (negatively skewed).

AMR data often has a bimodal shape because there are often two separate populations of isolates – those that are susceptible and those resistant. Also, AMR data is often skewed. Depending on the population of interest, there might be more isolates that are susceptible than resistant, or vice versa. For example, in secondary care, there may be more isolates tested that are resistant to an antimicrobial agent than isolates that are susceptible because the people who are sampled are more likely to have been treated with antimicrobials and suffered treatment failure. Figure 4 demonstrates a bimodal distribution of zone diameter measurements obtained by testing the susceptibility of Aeromonas salmoncida to oxolinic acid. (Note, the purpose of including Figure 4 in this module is to demonstrate a typical distribution of AMR data. Therefore, it is not necessary to interpret the breakpoints depicted on the graph.)

The strengths and limitations of histograms are listed below:

Table 8 Strengths and limitations of histograms
StrengthsLimitations
Summarise large datasetsCannot read exact values of each data point from histograms as the data is collapsed into categories
Show the relative frequency of occurrences of different data valuesDifficult to compare two datasets
Demonstrate visually the variation and distribution shape of data, which is useful when determining the statistical approach you may take to explore associations with your dataCan only be used with continuous data

2.1.1 Designing good graphs

2.1.3 Box-and-whisker plot