4.5.1 The effect of clustering on sample size
Cluster sampling is a cost-effective method that is particularly useful for populations with a wide geographical distribution, as it allows for the inclusion of diverse subgroups. The three key steps in cluster sampling are:
- Division into clusters, such as hospitals, wards or clinics.
- Random selection of clusters.
- Sampling within a cluster – data is collected from all participants within the selected clusters.
When data is clustered it effectively reduces the usable sample size compared to a non-clustered sample, due to increased similarilty within the clusters. This means a larger overall sample size is needed to achieve the same statistical power.
Additionally, cluster sampling carries a higher risk of bias compared with other sampling techniques. There is a higher likelihood of population homogeneity within a cluster; for example, a randomly selected hospital ward may result in sampling only females or males, leading to over- or under-representation of certain subgroups within the population.
To reduce sample bias, researchers estimate the intra-cluster correlation coefficient (ICCC), which reflects the similarity between observations within each the cluster and helps to determine the effective sample size needed. A higher ICCC implies a greater need to increase sample size to account for the clustering effect; however, time, cost and resources will also influence sample size.
4.5 Additional considerations when calculating sample sizes

