4.5.1 The effect of clustering on sample size

Cluster sampling is a cost-effective method that is particularly useful for populations with a wide geographical distribution, as it allows for the inclusion of diverse subgroups. The three key steps in cluster sampling are:

  1. Division into clusters, such as hospitals, wards or clinics.
  2. Random selection of clusters.
  3. Sampling within a cluster – data is collected from all participants within the selected clusters.

When data is clustered it effectively reduces the usable sample size compared to a non-clustered sample, due to increased similarilty within the clusters. This means a larger overall sample size is needed to achieve the same statistical power.

Additionally, cluster sampling carries a higher risk of bias compared with other sampling techniques. There is a higher likelihood of population homogeneity within a cluster; for example, a randomly selected hospital ward may result in sampling only females or males, leading to over- or under-representation of certain subgroups within the population.

To reduce sample bias, researchers estimate the intra-cluster correlation coefficient (ICCC), which reflects the similarity between observations within each the cluster and helps to determine the effective sample size needed. A higher ICCC implies a greater need to increase sample size to account for the clustering effect; however, time, cost and resources will also influence sample size.

4.5 Additional considerations when calculating sample sizes

4.5.2 The effect of subgroups on sample size