For a sample to be useful to make inferences about a population, we need it to be sufficiently large and without undue bias. The sample should be representative of the population, such that any subgroups present in the population are represented in the same proportion in the sample. Disproportionate representation of subgroups in sampling can lead to bias–the tendency of a sample statistic to systematically over- or under-estimate the same statistic from the population. Bias can be intentional or unintentional.

The size of the sample is an important consideration. If the sample is too large, it may be too expensive or time-consuming to collect the data. If it is too small, the sample may not be representative of the population.

Let's look at possible sources of error in a sample.

Sampling error

Sampling error is the difference between a statistic found from a sample and the actual value of the population parameter. This can occur due to random variations in different samples or be introduced by a poor sampling method.

When collecting data from repeated random sampling, it is to be expected that the selection for a sample and hence, results will vary somewhat between samples. This type of error can be reduced by taking samples of larger sizes.

A sample can also fail to accurately reflect the population if certain groups are over- or under-represented in it. This can happen when subjects are selected because of some convenience factor. For example: they have a listed telephone number or live in an easily accessible location or happen to be available when the survey is being done.

Over- or under-representation can also happen when subjects volunteer to be in the survey. Self-selected respondents tend to be those with stronger than usual opinions or with a particular characteristic that may distinguish them from the general population and lead to their response type being over-represented.

Surveys that are voluntary and in which there are many non-responses will generally under-represent the kinds of responses that might have been given by the non-responding subjects.

Randomisation is the key to reducing bias in the selection of a sample. Subjects should be selected randomly in such a way that each member of the population is equally likely to be selected.

Measurement error

Measurement error is unintentional bias caused by the actual measuring device used to collect the sample. In a quantitative experiment, a measuring device that has been incorrectly calibrated may systematically cause all results to be skewed. In qualitative research, the scope for measurement bias is wider and much more subtle, such as poorly designed questions, leading questions, interview technique, subjective response scales, survey environment and confidentiality of results.

We can further categorise sampling and measurement errors as a: systematic error or random error.

Systematic error

Systematic error causes all the results to be off from the true value in a consistent manner. This usually occurs when there is something wrong with the measuring apparatus or with the procedure being used, or interference from the environment. For example an instrument may not have been correctly calibrated, or a scale may have been consistently misread, or a disruptive noise impacting all students in a class test may cause their results to be consistently lower than expected achievement.

In a survey, bias may have been introduced by poor design of the survey questions or by the manner in which the sample of the population to be surveyed was selected.

Once detected, systematic error can usually be reduced by improvements in the experimental procedure and by taking steps to eliminate bias due to the survey design and to the sampling method.

Random error

Random error can be thought of as the remaining unexplained error after known sources of systematic error have been removed. This will impact results in an unpredictable manner and can be more difficult to correct for.

When several careful measurements of the same physical quantity are made, slightly different results may be reported at each trial due to small but uncontrolled fluctuations in the experimental conditions and by the impossibility of reading a measurement scale beyond some level of precision. The reported result of such a measurement is given as the mean of all the observed results together with an indication of how far the observations vary about the mean. The amount of variation is usually given by a statistic called the variance or by its square root, the standard deviation.

The accuracy of the result of a series of repeated measurements is increased by increasing the number of trials of the experiment, thereby reducing the standard deviation of the measurement data.

In a sample survey, it is to be expected that repeated sampling will give a somewhat different result for each sample. Small samples are more variable than larger ones. Techniques exist for determining how large a sample should be in order to be confident that the true value of the quantity being estimated is reasonably close to the value observed in the sample.

Polling conducted before an election, for example, often gives the percentage of voters expected to vote for a particular candidate and also a margin of error which is to do with the accuracy of the sampling process and, in particular, with the size of the sample of the population that was surveyed.

Practice questions

Question 1

A $TV$ station wants to know what the most popular type of music is, so they ask listeners to contact them and vote for their favourite type of music.

Is the sample chosen biased or fair?
Biased
A
Fair
B
Identify the type of bias involved.
The sample is from self-selecting participants, i.e. only those that made an effort to respond.
A
The sample is not representative of the target population.
B

Question 2

Adults attending a local cinema were asked the following question:

“How many times did you see a movie at this cinema last year?”

Is this an example of Sampling Error or Measurement Error?
Sampling error
A
Measurement error
B
What type of Measurement Error is this?
The scale provided is inadequate.
A
This is an assumption based question.
B
Poor and/or leading question wording.
C
What is the main reason why this question is poor?
Poor and/or leading question wording
A
The people being surveyed need more information.
B
Relies too heavily on respondent memory.
C

Outcomes

2.3.4.2

describe sources of error in surveys, including sampling error and measurement error

12.02 Errors in surveys

A biased or fair sample