Guide to Statistics: "Sampling for Surveys;"

Print this page

1 Introduction

Sample Survey methods play a vital role in helping us to understand different aspects of the world we live in. They are designed as a statistical basis for collecting and interpreting data from finite populations and underlie much of the work on opinion polls, market research and surveys of social, medical, environmental and other issues.

We need to use statistically sound methods of data analysis and inference designed for finite populations and also methods for handling non-statistical difficulties in their application (e.g. non-sampling errors, non-response, biases produced by sensitivity issues, question biases etc).

The finite population on which we conduct a survey might be very large (all voters in the UK) and the survey sample size quite small (perhaps 1000 or so voters), or the population may be small (the 140 members of a Social Club) with a much higher sampling fraction (e.g. a sample of 40 members).

Target populationThis is the total finite population of interest e.g all voters in the UK.

Study populationThis is the population we will actually study e.g. all voters in a chosen set of constituencies in which we will take observations (hoping it ‘represents’ the target population).

Population variable and population parameterWhat we will measure on each population member (voting intention: what party?) and its associated characteristic of interest  such as the population parameter (e.g. the proportion who intend to vote for Party A or the implied total number of votes for that Party). Note that voting intention and actual votes cast may be quite different in the event; this is one of the non-statistical problems we face.

Sampling units and sampling frameThe entities we will actually sample and the complete set of them e.g. shoppers in selected high streets at different times or eligible occupants at electoral roll addresses in selected wards. Choices have to be made in all these respects.

Why take a sample?  Clearly a full population study (a census) is seldom feasible in terms of accessibility, time or cost and these three factors control the sampling imperative to obtain sufficiently statistically-reliable and affordable information about the target population. See Barnett (2002).

How should we sample?We must draw a sample which is representative of the population and for which we can assess its statistical properties. In particular, accessibility sampling (‘take what’s to hand’) or judgmental sampling (deliberate subjective choice of sample members) will inevitably lead to bias and will not have measurable statistical properties.

Suppose we are interested in a quantity (a variable) Y measured on the N members of the population; so that the population can be represented: $$Y_{1}, Y_{2},\ldots , Y_{N}$$. We will be interested in a characteristicof the population:
Population mean $$\bar{Y} = \left(\sum_{1}^{N}y_{i}\right)/N$$   
Population total $$Y_{T} = \sum_{1}^{N}y_i$$     
The population proportion, $$P$$, of population members falling into some category with respect to the variable $$Y$$, e.g. the voters who say they will vote for party A.
Of course, $$\bar{Y}$$, $$Y_{T}$$ and $$P$$ will not be known and the aim of sample survey methods is to construct statistically-sound methods to make inferences about the population values from a sample of $$n < N$$ values $$y_{1} , y_{2}, \ldots , y_{n}$$ drawn from the population.  (Note that not all $$Y_{i}$$, nor all $$y_{i}$$, necessarily take different values; in the voting example there will only be a few possible values which can be taken.)

Contents