2.1.2 Histograms

Histograms show the distribution of values for a quantitative (continuous) variable. Histograms are useful when there are many observations and you want to understand the overall shape and spread of your data.

The x-axis is marked in the units of measurement for the independent variable (e.g. age, time, MIC, zone diameter). The y-axis is the scale that shows you the number of times (frequency, proportion) the value in an interval occurred.

To create a histogram, you need to first group data from the independent variable into class intervals (bins) of equal width and then count the values in each interval (class frequency). Class frequencies are represented by bars on a histogram. The height of each bar corresponds to its class frequency. An example of how to construct a histogram from raw data is shown below:

Example

In this example, a series of observations have been recorded in a variable called ‘Age’. To construct the histogram, ‘Age’ is split into five class intervals. Each interval contains the count of occurrences.

Table 6 Variable name: Age
39	41	22	38	46	55	65	78	83	18
28	54	53	61	10	16	29	58	55	66

Table 7 Frequency of the variable: Age
Class interval	Class frequency	Observations
0–20	3	10, 16, 18
20.1–40	5	22, 28, 29, 38, 39
40.1–60	7	41, 46, 53, 54, 55, 55, 58
60.1–80	4	61, 65, 66, 78
80.1–100	1	83

The histogram is then constructed based on the number of class intervals which are plotted on the x-axis with the y-axis showing the frequency (number) of occurrences in each class interval.

Figure 3 Example histogram of the age groups of people

Show description|Hide description

A histogram showing the number of people in a sample in each of five equal age groups (from 0–20 to 80.1–100).

Figure 3 Example histogram of the age groups of people

Traditionally, a histogram is drawn with no space between classes to indicate all values of the variable are represented. (This is different from a bar chart, which has space between the classes.) However, sometimes histograms may be drawn with spaces between the classes for greater visual impact. You can see this in Figure 4.

Maximise

Figure 4 Histogram showing the distribution of zone sizes obtained for oxolinic acid against Aeromonas salmoncida (n=323) (grey bars) and the breakpoints currently being used (black bars) (Smith, 2008)

Show description|Hide description

Histogram showing a distribution of zone sizes obtained from testing oxolinic acid against A.salmoncida (grey bars, showing a bimodal distribution) with the breakpoints currently used superimposed as eight black bars. This is a typical distribution for AMR data.

Figure 4 Histogram showing the distribution of zone sizes obtained for oxolinic acid against Aeromonas salmoncida (n=323) (grey bars) and...

The purpose of the histogram (or, indeed, of any graph) is to help understand the data. When viewing a histogram, look for important features, including the shape and spread of the data and whether there are any deviations (outliers). Outliers are data points that lie a long way from the general pattern in the data.

A histogram can have different shapes: it can be unimodal (a single peak representing the interval with the most values e.g. a normal distribution), bimodal (two peaks) or multimodal (more than two peaks). A histogram can also be symmetrical (when the right and left sides of the midpoint are similar) or skewed, where the intervals are grouped to the right (positively skewed) or left side (negatively skewed).

AMR data often has a bimodal shape because there are often two separate populations of isolates – those that are susceptible and those resistant. Also, AMR data is often skewed. Depending on the population of interest, there might be more isolates that are susceptible than resistant, or vice versa. For example, in secondary care, there may be more isolates tested that are resistant to an antimicrobial agent than isolates that are susceptible because the people who are sampled are more likely to have been treated with antimicrobials and suffered treatment failure. Figure 4 demonstrates a bimodal distribution of zone diameter measurements obtained by testing the susceptibility of Aeromonas salmoncida to oxolinic acid. (Note, the purpose of including Figure 4 in this module is to demonstrate a typical distribution of AMR data. Therefore, it is not necessary to interpret the breakpoints depicted on the graph.)

The strengths and limitations of histograms are listed below:

Table 8 Strengths and limitations of histograms
Strengths	Limitations
Summarise large datasets	Cannot read exact values of each data point from histograms as the data is collapsed into categories
Show the relative frequency of occurrences of different data values	Difficult to compare two datasets
Demonstrate visually the variation and distribution shape of data, which is useful when determining the statistical approach you may take to explore associations with your data	Can only be used with continuous data

2.1.1 Designing good graphs

2.1.3 Box-and-whisker plot

My OpenLearn Create Profile

Download this course

About this course

Course rewards

Summarising and presenting AMR data

2.1.2 Histograms

Example