The statistical investigation process, known as the data cycle, is a process where we solve a real-world problem by collecting and analyzing data.

A well formulated question in statistics should be written in a way that it has more than one possible answer. Answering the question should require collecting data (primary data) or finding data that someone else has already collected (secondary data). It should also be clear who the **population** is.

When we write our question, it can be helpful to think about what type of data we will be collecting.

Numerical data is sometimes called **quantitative data** because it is about quantities.

Stem | Leaf |
---|---|

0 | 1\ 1\ 2\ 2\ 4\ 5\ 5\ 6\ 6\ 6\ 7\ 7\ 7\ 8\ 8\ 8\ 9\ 9 |

1 | 0\ 0\ 1\ 1\ 1\ 3\ 3\ 4\ 6\ 7\ 7\ 7\ 9 |

2 | 1\ 3\ 3\ 7 |

Key 1\vert 4 = 14 days |

If we are looking for an overall summary of a large data set instead of individual data points, may group the data into **bins** or **classes** after collecting the data.

Sometimes a data cycle will create more questions like "How much time do cats spend in the shelter?" We can repeat the data cycle with these new populations and collect or acquire more data to try to answer these questions.

Determine if each question would result in numerical data or not. If it is numerical, explain if the collected data could be grouped or if we need to keep the individual data values.

a

What kind of transportation do students at my school use to get to school?

Worked Solution

b

How do the heights of students in your class vary?

Worked Solution

c

What is a typical score for a hockey team in a single NHL game?

Worked Solution

Is each question well formulated for the data cycle? Explain why or why not.

a

How many years has the Boston Red Sox baseball team been around?

Worked Solution

b

How do the heights of 7th and 8th graders at my school compare?

Worked Solution

c

What is the distribution of ages at the local martial arts studio?

Worked Solution

Idea summary

We follow the data cycle to help us formulate questions and use data to answer them.

The questions we ask may lead to data that is **numerical data**. This data may be left as individual data values or grouped when it is displayed.

Well formulated questions should have more than one possible answer and clearly identify the population we are looking to investigate.

When we have questions, we use different ways to collect data to find answers:

**Observation**: Watching and noting things as they happen**Measurement**: Using tools to find out how much, how long, or how heavy something is**Survey**: Asking people questions to get information**Experiment**: Doing tests in a controlled way to get data. For example, planting two identical plants, giving one sunlight and the other only artificial light, and observing the differencesAcquire existing

**secondary data**: Use data which was collected by a reliable source like census data, Common Online Data Analysis Platform (CODAP), or National Oceanic and Atmospheric Administration (NOAA) weather data.

We should choose a method that is realistic and ethical. This means making things possible and kind to all participants.

Realistic and ethical | Acquiring secondary data from a reliable source like the SPCA |
---|---|

Not ethical | Stealing one dog, putting it in a shelter and seeing how long it takes to get adopted |

Not realistic | Surveying every animal shelter in the US |

It can be too time consuming to survey the whole population, so we can select a **sample**, a smaller group from the population. The process of choosing the people or subjects for the sample is called **sampling**.

The sample should be:

**representative**of the population, by having the same characteristicsrandomly selected

big enough size to give reliable data

A larger **sample size** is usually better because they make a representative sample more likely, simply by including more of the population. However, a larger sample size can be a lot more expensive, time consuming, and difficult to organize. We need to balance a realistic sample size and reliable results.

A good sample can be used to make reasonable assumptions about the whole population. A sample that is too small or was not selected randomly can lead to an incorrect conclusion.

In previous grades, we have used line graphs, line plots, stem-and-leaf plots, and circle graphs.

For a short exploration of the data cycle, let the population be your class and explore a question which uses numerical data.

Formulate a question that you could easily collect data on.

Describe a realistic process for collecting the data. Would the sample be representative of the population? Explain.

Collect the data.

Would it make sense to leave the individual data values, or to group them?

Represent the data visually.

What does this data tell you about your original question?

Hannah has chosen to collect information using a sample.

a

What are the advantages of doing a sample? Select all that apply.

A

It is cheaper to conduct.

B

Any sample will represent the population.

C

It is more accurate.

D

It takes less time.

Worked Solution

b

What are the disadvantages of doing a sample? Select all that apply.

A

It takes more time.

B

It is more expensive to conduct.

C

It is less accurate.

D

There can be poor sampling.

Worked Solution

A middle school principal wants to determine whether students would support adding soccer as a new after school sports team. Anyone who attends the school would be able to join the team.

a

Identify the target population.

Worked Solution

b

What method would be best to find out how the students feels about the addition of a new sport?

A

Observation

B

Measurement

C

Survey

D

Acquire secondary data

Worked Solution

c

Explain why surveying members of the football team about their preference is not representative of the population.

Worked Solution

Donovan wants to explore temperature trends in his hometown over the last 50 years.

a

Formulate a question to help him complete his investigation.

Worked Solution

b

Could he use observation, measurement, survey, experiment, or acquire secondary sources? Explain.

Worked Solution

c

Explain how Donovan could collect data that could be used to answer his question from part (a).

Worked Solution

d

How do the temperature trends in your hometown compare to Donovan's?

Worked Solution

Idea summary

After we formulate a clear question, we use the data cycle to collect, show, and explain information. To get data, we can use methods like:

Watching

**(Observation)****Measuring**Asking quesions

**(Survey)**Doing

**experiments**Acquiring existing

**secondary data**

It's important to choose the right method based on the question we have.

When we collect data from a **sample**, we need to make sure that it is representative of the **population**. We can do this making sure the sample is randomly selected, is big enough, and has the same characteristics as the population.