Introduction to sample design

In statistics, the target population is understood to be a set of units from which we wish to obtain certain information. This set must be perfectly defined so that it is known, in an unequivocal manner, whether or not a certain unit belongs to the population.

Depending on the economic reality that is to be measured, the most appropriate "unit" from which information is to be obtained is established, as well as the criteria for considering this unit to be part of the target population or not. In other words, the statistical unit and the scope to which the population to be investigated is restricted must be clearly established. The size of the population is the number of elements or units that make up said population.

The concept of population set out above, as a set of units from which information is desired, is an ideal model. In practice, the sample is selected from a support material, a list of units, called a frame, which coincides with the target population. Strictly speaking, the sampling frame is defined as the list of sampling units from which the sample is selected.

Two methods can be considered to get to know the characteristics of the study population:

Exhaustive or census survey. The research is carried out on each and every one of the units that make up the population.
Sample survey. A set of elements is selected to form the sample, on which the characteristics of interest to the study will be investigated. This sample, which will have been duly selected (by means of sampling design techniques) will fulfil the function of representing the entire population, so that with the sample results it will be possible to obtain (estimate), by means of statistical inference, the population results. In other words, from the data obtained on a fraction of the population (sample), conclusions valid for the whole population are drawn.

The advantages of sample surveys are speed, and also that they do not incorporate statistical burden (completion of questionnaires) on the whole population. On the other hand, the statistical results (i.e. estimates) have a certain random character and differ to a greater or lesser extent from those that would have been obtained by investigating the whole population. In other words, the fact that the variables of interest to the study are measured in only part of the population means that the final values obtained for the whole population are estimates that are more or less close to the true values and, moreover, may vary according to the particular sample selected.

Stratified sampling

Sampling is the procedure or set of techniques for obtaining one or more samples that can be representative of a population. Strictly speaking, the term 'sampling' also applies to the method of selecting the sample.

There are several types of sampling, although the one normally used in economic surveys conducted by the National Statistics Institute is stratified sampling.

It consists of considering typical categories different from each other (strata) that are highly homogeneous with respect to some characteristic. The aim of this type of sampling is to ensure that all strata of interest are adequately represented in the sample. Each stratum functions independently, and other sampling techniques (simple random, systematic, etc.) can be applied within them to choose the specific elements that will be part of the sample.

Stratification consists of dividing the population into sub-populations or strata and then obtaining independent samples from each stratum. The total sample is thus formed by the sum of the sub-samples of all the strata. For the formation of each stratum, it is usual to use the variables of activity, size (number of employees) and geographical location (Autonomous Community).

The distribution of the sample by strata (called affixation) is often made proportional to the size of the stratum (number of units of the population in the stratum) and to the standard deviation of the variable we are considering (the so-called optimal affixation method). This variable must have a strong relationship with the main variables that we want to study in the survey, and as these are usually economic variables, the number of workers in the company is usually used for the affixation.

On some occasions, the population size of a stratum is so small, or the variability of the units that make it up is so wide, that it is necessary to collect information from all of the units included in it. This is determined by the affixation and also because the absence of some of the units would cause errors in the estimates and unreliable final results of the survey.