5 Cluster sampling
The final broad class of survey sampling schemes we will consider are known as cluster sampling methods. Again we are concerned with a population which is naturally divided into sub-populations or strata, possibly on many different criteria, as with stratification. A simple example is where we are sampling individuals from a list of addresses at which they reside. So the sampling unit is an address at which there may be several individuals. In stratified sampling, we take a sr sample from each stratum. But for ease of access it is more natural to sample addresses and then use information on all individuals at that address. However we will not sample all addresses, that is, all strata in the earlier example – to do so is prohibitive and would imply observing the whole population. Instead we take a sr sample of addresses, which are now known as clusters rather than strata to distinguish the fact that the sub-populations are typically much smaller than ‘strata’ and that we observe all individuals in each sampled unit. A one-stage cluster sample is a sr sample of the clusters where we observe all individuals in the sampled clusters.
If we were to take samples of the individuals in selected clusters we have what is called sub-sampling or two-stage cluster sampling. This extends to multi-stage cluster sampling where we choose from a set of primary units, then from secondary units within the chosen primary units and so on. For example, primary units may be educational authorities, secondary units the schools within them, tertiary units the classes in the schools etc.
In one-stage cluster sampling, the population consists of a set of clusters and we take a sr sample of them. Suppose we have $$M$$ clusters of sizes $$N_{1}, N_{2}, \ldots, N_{M}$$ and the cluster means are $$\bar{Y}_{i}$$ and within-cluster variances are $$S_{i}^{2} \left(i = 1, 2, \ldots, M\right)$$. The population mean and variance are again $$\bar{Y}$$, $$S^{2}$$. We will take a sr sample of $$m$$ clusters and observe all members of the chosen clusters to obtain a sample of size $$n>m$$.
The simplest case is where all clusters are of the same size $$L$$, say. Thus $$N=ML$$ and so the sampling fraction is $$f=n/N=m/M$$.
The cluster sample mean $$\bar{y}_{cl}$$ is the sample average of all the observations and is thus $$\bar{y}_{cl}=\sum_{1}^{m}\bar{y}_{i}/m$$ for equal-sized clusters. We are effectively just taking a sr sample of $$m$$ of the $$M$$ cluster means $$\bar{y}_{i}\left(i = 1, 2, \ldots, M\right)$$ and properties of the estimator $$\bar{y}_{cl}$$ follow from the results for basic sr sampling. Thus $$\bar{y}_{cl}$$ is unbiased for $$\bar{Y}$$ with $$Var\left(\bar{y}_{cl}\right)=\frac{1-f}{m}\sum_{1}^{M}\left(\bar{Y}_{i}-\bar{Y}\right)^{2}/\left(M-1\right)$$ since $$\bar{y}_{cl}$$ can be expressed as $$\sum_{1}^{m}\bar{y}_{i}/m$$ where the $$\bar{y}_{i}$$ are cluster sample means for a sr sample of $$m$$ of the $$M$$ clusters.Consider the alternative of a sr sample-type mean of size $$n=mL$$ drawn from the whole population. We find that $$Var\left(\bar{y}\right)-Var\left(\bar{y}_{cl}\right)$$ is a positive multiple of $$\bar{S}^{2}-S^{2}$$ where $$\bar{S}^{2}=\sum_{1}^{m}S_{i}^{2}/M$$ is the average within-cluster variance and so we conclude that $$\bar{y}_{cl}$$ will be more efficient than $$\bar{y}$$ if the average within-cluster variance is larger than the overall population variance. This is essentially the opposite of what we found for stratified sampling. But it is the administrative convenience of cluster sampling which is its main appeal. We will of course have to estimate $$Var\left(\bar{y}_{cl}\right)$$ by the obvious sample analogue to make use of the above results.
What if the clusters are not all the same size?
This is a more realistic scenario and there are three types of estimator that are used: the cluster sample ratio, or if the total population size is known the cluster sample total or (as a useful quick estimate) the unweighted average of the chosen cluster sample means. Alternatively, and usefully, we can replace sr sampling with some scheme with varying chances of choosing a sample member: in particular, by using sampling with probability proportional to size. (See Barnett, 2002, sections 5.2, 5.3 for further details.)
Cluster sampling in practice is usually employed in more complex multi-stage sampling schemes where the selected clusters in the primary cluster sample are themselves sub-sampled perhaps on various different criteria but these more complicated sampling methods take us beyond our brief in this introductory review.