4.2.1 Descriptive statistics for categorical variables

Descriptive statistics for categorical variables (nominal or ordinal) include counts and proportions. Counts, or frequencies, are the number of data points for a given value of the variable (‘level’). Proportions, or relative frequencies, are obtained as the count for a given value of the variable divided by the total number of data points (sometimes expressed as a percentage).

For example, let us consider the variable ‘resistance to levofloxacin’. This variable was measured for 120 methicillin-resistant Staphylococcus aureus (MRSA) isolates. This variable has two levels: ‘resistant’ for 38 of the isolates and ‘susceptible’ for the others. Descriptive statistics for this variable can be reported in a single sentence of text:

  • As a count: AST results showed that 38 of the 120 MRSA isolates were resistant to levofloxacin.
  • As a proportion: the AST results showed that 31.7% of the 120 MRSA isolates were resistant to levofloxacin (i.e. ((38/120)*100)).

Below is an example of reporting descriptive statistics in sentence form from the literature. This sentence reports relative frequencies (as percentages) for four variables containing the resistance status (resistant/susceptible) for four antimicrobials.

“A large number of MRSA isolates showed resistance to levofloxacin (83.9%), ciprofloxacin (83%), erythromycin (77.7%) and clindamycin (72.3%).” (Kot et al., 2020)

In the examples above, the outcome variable had only two levels (resistant/susceptible), and tabular or graphical representation would not provide much more additional information. When categorical variables have more than two levels, or when multiple variables with two levels are presented in parallel, they may also be displayed in a frequency table (Table 3) or in a graphical format using a bar chart (see module Summarising and presenting AMR data). When presenting proportions, it is good practice to also provide the corresponding counts, or frequencies, especially in cases where small sample sizes are used. The example in Table 3 shows the breed of cattle from which a target organism was recovered. The categorical variable ‘breed’ has four levels: Jersey, Guernsey, Holstein Friesian and Unknown.

Table 3 Example of a frequency table for nominal data type (cattle breed)
BreedFrequencyRelative frequency (%)
Jersey23420.7
Guernsey635.6
Holstein Friesian80070.7
Unknown343.0
Total1131100

4.2 Descriptive analysis

4.2.2 Descriptive statistics for a numeric variable