4.3.4 Limitations of null hypothesis testing

Basing inferences solely on whether a probability (i.e. p-value) exceeds an arbitrary cut-off is not good practice. Other elements must be considered, such as the strength of association and the sample size used. In a setting where we are assessing the difference between two means, for example, the strength of association can be assessed by calculating the difference between or the ratio of these two means. While the p-value tells us whether this difference is statistically significant, the value of this difference tells us whether it is a clinically or biologically important effect. For example, let us consider that the proportion of isolates resistant to erythromycin in a given selection of patients decreased from 10.0% to 9.8% after applying a set of recommendations for combating AMR in a hospital for one year. After accounting for sample sizes, the difference is statistically significant. But is a decrease of 0.2% meaningful? If you would be the funder of this intervention, would you consider that a successful outcome?

Although p-values are widely used throughout the scientific literature, they are commonly misinterpreted. Many scientists and statisticians are now advocating abandoning the use of p-values altogether. In practical terms, the use of an arbitrary cut-off (e.g. 0.05) means that for p=0.045 we will consider that the observed difference is ‘significant’, while this would not be the case for p=0.055. It is not so straightforward in reality, and statistical tests provide information in terms of strength of evidence for or against a given hypothesis rather than a binary answer. It is more useful to think about p=0.001 as providing stronger evidence for an effect than either p=0.04 or p=0.06. There are a range of advanced alternative approaches to null hypothesis tests, which report and interpret p-values in different ways. The key thing to remember is to always interpret the strength of the statistical evidence alongside consideration of how clinically or biologically important the effect is.

4.3.3 Confidence intervals

4.3.5 Choosing statistical tests (OPTIONAL)