The statistical investigation process is a process that begins with the need to solve a real-world problem and aims to reflect the way statisticians work. The data cycle gives us a nice structure to follow:

To help us formulate or write our question, we can think about whether we will get **univariate data** or **bivariate data**.

\text{Person} | \text{Age (years)} | \text{Bone density }\\ (g \text{/} cm^{3}) |
---|---|---|

\text{Alia} | 30 | 1.35 |

\text{Boyd} | 40 | 1.28 |

\text{Cato} | 50 | 1.22 |

\text{Daria} | 60 | 1.10 |

\text{Eve} | 70 | 0.97 |

A clear question helps us know what kind of data to gather and who to collect it from. The type of question we ask can lead us to collect different data.

When formulating a question, it should be a **statistical question**. It must have all the following features:

Can be answered by collecting data

Anticipate some degree of variability - more than one possible answer

Provide data that can be represented in a visual display - the type of display will depend on the type of data and the question

Statistical question | Not statistical question |
---|---|

How much money do professional female athletes make? | How much money does NCAA star Caitlin Clark make in Name-Image-Likeness deals? |

How long do people keep leftovers in their fridge before eating them? | Does mayonnaise need to be kept in the fridge after opening? |

How much time is spent on social media? | Do you use social media? |

What are the five most popular video games among 8th graders? | Does your best friend play Minecraft? |

Select the question(s) which could be answered by collecting univariate data. Select all that apply.

A

What is the median house price in Virginia?

B

Do I need to file taxes every year as a part-time employee?

C

What is the relationship between reaction time and hours of sleep the previous night?

D

What is the distribution of salaries of professional rugby players in North America?

Worked Solution

Determine whether or not each question is a statistical question. Explain why or why not.

a

How many books did my teacher read last year?

Worked Solution

b

How many steps do most students in our school walk each day?

Worked Solution

c

What is the range of money spent on snacks per week by those who pack a lunch?

Worked Solution

Idea summary

**Univariate data** is data where only one attribute or characteristic is collected.

Before we can collect data, we need a clear **statistical question** that:

Can be answered by collecting data

Has some amount of variation - more than one possible answer

Results in data that can be shown in a visual display - the type of display will depend on the type of data and the question

To answer a statistical question, we first collect the necessary data. There are several main methods for data collection:

Observation: Watching and noting things as they happen

Measurement: Using tools to find out how much, how long, or how heavy something is

Survey: Asking people questions to get information

Experiment: Doing tests in a controlled way to get data

Acquire existing data: Using a secondary source, usually an online database, to get raw or summarized data.

As we have seen, collecting data from every member of the **population** can be very expensive and take a lot of time. In the United States, every ten years, data is collected from the whole population. This is called the **census**. We may use census data a secondary source: https://data.census.gov/.

To save time and money, we can collect data from a subset of the population, called a **sample**. However, we need to be sure that our sample is **representative** of the population.

When there is bias in the data cycle, we may get misleading or inaccurate conclusions. Here are some specific types of bias:

There are a number of ways we can avoid bias in our sample, including:

Having a sample that is large enough to represent the characteristics of the population. The larger the sample size, the closer the results will be to that of the population.

Having a sample that is selected without strategically choosing more people from a certain group.

Randomly selecting the sample.

Once our sample is selected, it is possible to introduce even more bias, such as:

Determine whether each situation demonstrates a sample survey, an experiment, or an observational study.

a

A grocery store wants to know if their customers would use self-checkouts if they were added or if they prefer using the standard checkout lanes that are staffed.

Worked Solution

b

A group of students wants to know how different levels of fertilizer affect plant growth.

Worked Solution

c

Endangered, wild wolves were reintroduced to Yellowstone National Park. The conservationists want to know if, and by how much, the population of wolves is growing.

Worked Solution

A city council wants to determine whether a new skateboard park or a new ice skating rink should be built as the new community building project. The new project will be located in the city park.

a

Identify the target population.

Worked Solution

b

What design methodology would be best to find out how the community feels about the two proposed community building projects?

A

Observation

B

Measurement

C

Survey

D

Acquire secondary data

Worked Solution

c

Explain why using the local ice hockey team as the sample would not give representative data.

Worked Solution

For each survey question and sample, determine whether the results are likely to be biased or not. Explain your answer.

a

To answer the question "How much time do students at my school spend practicing a musical instrument per week?", Yvonne surveys the people in her jazz quartet.

Worked Solution

b

To answer the question, "What range of speeds do people drive on I95 throughout the day?", Lachlan uses a radar gun to observe and measure the speeds of 100 cars in the right lane between the hours of 8 AM and 9 AM.

Worked Solution

c

To answer the question "How much rain does Middleburg, VA get per month?" Tricia uses historical weather data from a reliable source for the past 20 years.

Worked Solution

Idea summary

After we formulate a clear statistical question, we use the data cycle to collect, show, and explain information. To get data, we can use methods like:

Watching

**(Observation)****Measuring**Asking questions

**(Survey)**Doing

**experiments**Acquiring existing

**secondary data**

If the sample is representative of the population, the data may be used to understand the population. There are a number of potential sources of bias including:

A sample that does not resemble the population.

A sample that is too small to be representative.

A sample that is not randomly selected, such as a convenience sample.