11. Inferential Statistics

Lesson

Collecting data from every member of a population is the most accurate way of gathering information, but it is not always the most practical and can be very expensive. For these reasons, data is often gathered from a smaller group, or sample, that can be used to **estimate** the characteristics of the wider population.

The size of the sample is an important consideration. If the sample is too large, it may be too expensive or time-consuming to collect the data. If it is too small, the sample may not be representative of the population.

Bias in a sample means that certain characteristics of the population are over- or under-represented. One way to reduce bias is to use some form of **randomisation** in the sampling method.

A random sample is formed by selecting members from the population at random, where each member of the population has an equally likely chance of being selected. In simple cases, a sample could be created by drawing names from a hat. For most samples though, it is more common to use a random number generator.

Generating random numbers

Calculators and spreadsheet applications usually have a **random number generator** that can generate random decimals between $0$0 and $1$1.

These can be used to create random numbers within any range of values. For example, if we require a random number between:

- $1$1 and $50$50, we would multiply the randomly-generated decimal by $50$50, then round the answer up to the next whole number.
- $20$20 and $30$30, we would multiply the randomly-generated decimal by $30$30, add $20$20, then round the answer up to the next whole number.

"Do you prefer this rad shirt or the ordinary one on the shelves at the moment?"

Is the question biased or fair?

Biased

AFair

BBiased

AFair

BIdentify the type of bias.

It is an emotive question.

AIt uses specific terminology.

BIt is a leading question.

CIt is an emotive question.

AIt uses specific terminology.

BIt is a leading question.

C

A school principal wants to estimate the number of students who ride a bicycle to school. Which sample(s) should be used to not introduce bias?

All students who are in the school band.

A$8$8 students in the hallway.

BTen students from each grade, chosen at random.

C$130$130 students during the lunch periods.

DAll students who are in the school band.

A$8$8 students in the hallway.

BTen students from each grade, chosen at random.

C$130$130 students during the lunch periods.

D

There are several different methods for collecting sample information. The design of the data collection method and the way it is implemented can have a big impact on the quality of the data obtained.

The most common methods include the following:

- Questionnaires and surveys
- Experiments and simulations
- Observational studies
- Data logging (most websites do this automatically)

Data collection methods

Survey - a method for data collection where some members of a population give their responses on certain characteristics, behaviors, or opinions.

Experiment - a method for data collection where a sample is divided into two groups- an experimental group and a control group.

Observational study - a method for data collection where members of a sample are measured or observed without being altered by the study.

In any measurement procedure, there is inevitably some error. This is true when an experiment is being done to measure some physical quantity, and it is true in the case of surveys designed to quantify people's voting intentions or brand preferences.

We distinguish two kinds of error: systematic error and random error.

Systematic error occurs when there is something wrong with the measuring apparatus or with the procedure being used. An instrument may not have been correctly calibrated, for example, or a scale may have been consistently misread. In a survey, bias may have been introduced by poor design of the survey questions or by the manner in which the sample of the population to be surveyed was selected.

Once detected, systematic error can usually be reduced by improvements in the experimental procedure and by taking steps to eliminate bias due to the survey design and to the sampling method.

Random error can be thought of as the remaining unexplained error after known sources of systematic error have been removed.

When several careful measurements of the same physical quantity are made, slightly different results may be reported at each trial due to small but uncontrolled fluctuations in the experimental conditions and by the impossibility of reading a measurement scale beyond some level of precision. The reported result of such a measurement is given as the mean of all the observed results together with an indication of how far the observations vary about the mean. The amount of variation is usually given by a statistic called the variance or by its square root, the standard deviation.

The accuracy of the result of a series of repeated measurements is increased by increasing the number of trials of the experiment, thereby reducing the standard deviation of the measurement data.

In a sample survey, it is to be expected that repeated sampling will give a somewhat different result for each sample. Small samples are more variable than larger ones. Techniques exist for determining how large a sample should be in order to be confident that the true value of the quantity being estimated is reasonably close to the value observed in the sample.

Polling conducted before an election, for example, often gives the percentage of voters expected to vote for a particular candidate and also a margin of error which is to do with the accuracy of the sampling process and, in particular, with the size of the sample of the population that was surveyed.

Tina wants to know which new smart phone she should buy and decides to base her decision on other people’s opinions. She decides to interview some people at the Retirement Home where her grandparents live one Sunday afternoon.

Is this an example of Sampling Error or Measurement Error?

Sampling error

AMeasurement error

BSampling error

AMeasurement error

BWhat type of Sampling Error is this?

The sample is too small

AThe sample is not random

BThe sample does not adequately represent the population.

CThe sample is too small

AThe sample is not random

BThe sample does not adequately represent the population.

C

Adults attending a local cinema were asked the following question:

“How many times did you see a movie at this cinema last year?”

Is this an example of Sampling Error or Measurement Error?

Sampling error

AMeasurement error

BSampling error

AMeasurement error

BWhat type of Measurement Error is this?

The scale provided is inadequate.

AThis is an assumption based question.

BPoor and/or leading question wording.

CThe scale provided is inadequate.

AThis is an assumption based question.

BPoor and/or leading question wording.

CWhat is the main reason why this question is poor?

Poor and/or leading question wording

AThe people being surveyed need more information.

BRelies too heavily on respondent memory.

CPoor and/or leading question wording

AThe people being surveyed need more information.

BRelies too heavily on respondent memory.

C

Understand statistics as a process for making inferences to be made about population parameters based on a random sample from that population.

Recognize the purposes of and differences among sample surveys, experiments, and observational studies; explain how randomization relates to each.