topic badge

7.01 Statistical investigation process

Lesson

Governments, businesses and researchers gather statistics so they can draw conclusions and make decisions about specific issues. If we are planning to use statistics to make an important decision, we need to make sure that we carefully construct a statistical investigation that is appropriate for the situation.

The statistical investigation process is a process that begins with the need to solve a real-world problem and aims to reflect the way statisticians work.

A general approach, the statistical investigation process consists of the following stages:

Statistical investigation process
  1. Identify a problem
    • clarify the problem and formulate questions that can be answered with data
  2. Collect data
    • design and implement a plan to collect or obtain appropriate data
  3. Analyse data
    • select and apply appropriate techniques to analyse the data
  4. Interpret and communicate the results
    • interpret the results of analysis in a way that relates to the original question

The statistical investigation process is often considered to be a cyclical process, as shown in the diagram below, where the results can be used to refine the problem so that new questions can be posed to give further insight.

 

Statistical questions

The statistical investigation process begins with identifying a problem that can be answered with data. The problem can be formulated as a statistical question.

Features of a statistical question

A statistical question must have all of the following:

  • a variety of possible answers: there has to be more than one possible answer to a question.
  • states a population: it must apply to more than one person or object. (Note that a population does not have to refer to people.)
  • require some statistical methods to answer the question.

Examples of statistical questions are: “What is the average life expectancy in Australia?”, or “What percentage of cereal boxes contain less than the advertised weight of cereal?”.

On the other hand, these questions are not statistical questions because they lack one or more of the required features: “At what time does the last train depart from central station on a weeknight?”, which has a single answer and “How old am I?”, which applies only to a single person. Neither of these questions requires the use of statistical methods to answer.

The intended outcome of our investigation can produce a conclusion that answers the statistical question.

Practice question

QUESTION 1

Which three of the following are statistical questions?

  1. How much do puppies weigh?

    A

    Have you ever visited Tokyo, Japan?

    B

    How old are Olympic gold medal winners when they win their medal?

    C

    What is the wage of people in California?

    D

    What superpower would you want to have?

    E

    Do you prefer chocolate or popcorn?

    F

 

Collecting or obtaining data

Posing survey questions

The process of collecting data is called a survey, which often involves asking questions, making observations or taking measurements.

It is important to make sure that we ask the right questions when we are collecting data. We need specific research questions that will give us specific responses that will help us to understand the problem that we are trying to address.

The questions posed should lead to data that can be easily organised. It is usually better to use questions that are not open-ended, but rather have a limited number of options from which participants can choose their answers.

For example, instead of asking someone “What is your favourite colour?”, it would be better to ask “Which of the following colours is your favourite?” and to list a few common colours that they can choose from.

Survey questions that tend to elicit responses unfairly favouring one option over another should be avoided because they will lead to response bias.

Practice question

question 2

Which of these questions are fair?

  1. Do you feel that the TV news is an inaccurate portrayal of life’s problems?

    A

    Don't you think this newspaper is biased?

    B

    Do you prefer the look and feel of thick lush carpeting in your living room?

    C

    Do you take these extra strength multi-vitamins to supplement your diet?

    D

    None of them

    E

Once we have posed questions, we need to collect or obtain appropriate data to answer them.

We have to decide on how we will collect the data, the type of data we will collect and the sources from which we will collect them.

 

Census or sample

Our survey can apply to the entire population, which we call a census, or some subset of the population, which we call a sample.

Census or Sample

A census is a type of survey that involves collecting data from the entire population of interest.

A sample is a survey where data is collected from a selection of the population. A sample might be used because it is too time-consuming, too costly or simply impossible to access the entire population.

A sample needs to represent the population from which it is drawn. In other words, we want to make sure there is no selection bias, or unfairness, that could affect our results.

There are different ways to collect a sample.

Sampling methods

With random sampling, each individual is selected at random. In other words, each individual has the same probability of being chosen.

With stratified sampling, the population is divided into subgroups with a common characteristic, and the number from each group in the sample is made to be in proportion to the size of the subgroup.

For systematic sampling, the first individual is chosen randomly and then every $n$nth individual that follows is chosen.

A sample is biased if certain groups are over or under-represented in comparison to the population. This can happen when individuals are selected because of some convenience factor (known as convenience sampling). For example, selecting individuals happen to be available when during typical working hours will result in an over-representation of people who are unemployed.

When subjects volunteer to be in the survey it is referred to as self-selection sampling and is very likely to result in a biased sample.

Practice questions

QUESTION 3

State whether the following is an instance of a sample or a census:

  1. An election to decide the premier of Queensland.

    Sample

    A

    Census

    B
  2. Asking a random selection of students in your class whether they approve of the teacher.

    Sample

    A

    Census

    B
  3. A taste test of a large batch of cookies you have just baked.

    Sample

    A

    Census

    B
  4. A body scan of randomly selected passengers at Melbourne International Airport.

    Sample

    A

    Census

    B

QUESTION 4

Choosing every $5$5th person on the class roll to take part in a survey is an example of:

  1. Stratified Sampling

    A

    Random Sampling

    B

    Systematic Sampling

    C

 

Data sources

It is important to state the source of data used for statistical investigation. An important choice is whether to collect the data ourselves or if we can use data that was already collected or produced by others.

Data sources

Primary data is gathered for the first time by the researcher and involves collecting data by interviewing, observing others or conducting experiments.

Secondary data is data that has been previously collected or generated by others. Common secondary data sources are books or the internet. It is important to be sure that the source is reliable, such as a government organisation.

The choice to use primary or secondary data will depend on factors such as cost and time. If suitable secondary data is available, it will usually be cheaper and quicker than obtaining new data. When secondary data is used, the source should be attributed so that people reading your investigation report can determine that the data is reliable.

 

Practice questions

QUESTION 5

Which of the following are primary data sources?

  1. Select all that apply.

    Measurements taken from a scientific experiment that you conducted

    A

    Weather observations for the last twelve months in your town, published on the internet by the Bureau of Meteorology

    B

    Average incomes of Australian workers published by the Bureau of Statistics

    C

    Categorical data that you collected during interviews with local business owners

    D

QUESTION 6

Which of the following secondary data sources would be considered unreliable?

  1. Select all that apply.

    Weather observations for the last twelve months in your town, published on the internet by the Bureau of Meteorology

    A

    A post on social media that is stating ‘facts’ about the dangers of immunisation

    B

    A publication from an investment company that provides data that shows you can make incredible profits from their investment schemes

    C

    Climate data for 1998 to 2016 published by NASA

    D

 

Organising and displaying data

A statistical investigation can result in the collection of a large quantity of information. We need to arrange the data we have collected into a form that gives structure and order to the data so that we can identify patterns and understand relationships.

The first step to organising data is often to arrange the data into a table.

For statistical data, we would usually use frequency tables or grouped frequency tables, depending on the type of data that we are collecting.

Once we have organised the data, we need to present the data in a form that will be easy to read, understand and analyse.

Some common statistical graphs are:

  • histograms
  • dot plots
  • stem and leaf plots
  • bar charts
  • box and whisker plots

Besides displaying the data in a graph, it may also be beneficial to summarise the data using statistical quantities such as the mean, median, mode, range, standard deviation and variance.

We will learn more about organising, displaying and analysing data in the following lessons.

 

 

Analysing and interpreting

Once data is organised and displayed in the appropriate form, we are able to interpret the data, to decide on what it means and to ultimately draw conclusions from it.

Analysing can involve identifying trends and patterns from the data, and identifying how those trends and patterns change over time or across categories (such as across different populations). Analysing can require us to interpret statistical quantities and performing statistical tests.

Practice question

QUESTION 7

Which of the following are only used in analysing the results of a statistical investigation?

  1. Select all that apply.

    Identifying clusters and outliers in a histogram

    A

    Calculating statistical quantities like mean and median

    B

    Using a secondary data source to obtain statistical data

    C

    Constructing a histogram from survey results

    D

 

 

Communicating results

In the final stage of the statistical investigation process, we should interpret the results of the analysis and relate the interpretation to the original question.

We should communicate findings in a systematic and concise manner.

Our conclusions could give support or reject a proposition in the original question, or could be inconclusive and indicate that further investigations are required.

Remember!

Conclusions drawn from the statistical investigation should be related to the original question.

It is often the case that the conclusions of a statistical investigation can lead to the identification of new statistical questions that can be investigated to further refine our understanding. Consequently, the statistical investigation process is usually represented as a cyclical process.

Practice question

QUESTION 8

Sophia conducted a statistical investigation seeking to answer the question “What is the most common reason that students are typically absent at her school?”

Which of the following could be appropriate conclusions to the investigation?

  1. Select all that apply.

    The rate of student absence is $1.4%$1.4%

    A

    The data was multimodal, so there is no single most common reason for absence from school.

    B

    Student attendance at school would be improved by more vaccination programmes to prevent illnesses

    C

    There is evidence to suggest that the most common reason that students are absent from school is illness.

    D

 

What is Mathspace

About Mathspace