4.5.1 The effect of clustering on sample size

Cluster sampling is a cost-effective method that is particularly useful for populations with a wide geographical distribution, as it allows for the inclusion of diverse subgroups. The three key steps in cluster sampling are:

Division into clusters, such as hospitals, wards or clinics.
Random selection of clusters.
Sampling within a cluster – data is collected from all participants within the selected clusters.

When data is clustered it effectively reduces the usable sample size compared to a non-clustered sample, due to increased similarilty within the clusters. This means a larger overall sample size is needed to achieve the same statistical power.

Additionally, cluster sampling carries a higher risk of bias compared with other sampling techniques. There is a higher likelihood of population homogeneity within a cluster; for example, a randomly selected hospital ward may result in sampling only females or males, leading to over- or under-representation of certain subgroups within the population.

To reduce sample bias, researchers estimate the intra-cluster correlation coefficient (ICCC), which reflects the similarity between observations within each the cluster and helps to determine the effective sample size needed. A higher ICCC implies a greater need to increase sample size to account for the clustering effect; however, time, cost and resources will also influence sample size.

4.5 Additional considerations when calculating sample sizes

4.5.2 The effect of subgroups on sample size

My OpenLearn Create Profile

Download this course

About this course

Course rewards

Sampling (human health) (2025)

4.5.1 The effect of clustering on sample size