The statistical investigation process, known as the data cycle, is a process where we solve a real-world problem by collecting and analyzing data.
A well formulated question in statistics should be written in a way that it has more than one possible answer. Answering the question should require collecting data (primary data) or finding data that someone else has already collected (secondary data). It should also be clear who the population is.
When we write our question, it can be helpful to think about what type of data we will be collecting.
Numerical data is sometimes called quantitative data because it is about quantities.
Stem | Leaf |
---|---|
0 | 1\ 1\ 2\ 2\ 4\ 5\ 5\ 6\ 6\ 6\ 7\ 7\ 7\ 8\ 8\ 8\ 9\ 9 |
1 | 0\ 0\ 1\ 1\ 1\ 3\ 3\ 4\ 6\ 7\ 7\ 7\ 9 |
2 | 1\ 3\ 3\ 7 |
Key 1\vert 4 = 14 days |
If we are looking for an overall summary of a large data set instead of individual data points, may group the data into bins or classes after collecting the data.
Sometimes a data cycle will create more questions like "How much time do cats spend in the shelter?" We can repeat the data cycle with these new populations and collect or acquire more data to try to answer these questions.
Determine if each question would result in numerical data or not. If it is numerical, explain if the collected data could be grouped or if we need to keep the individual data values.
What kind of transportation do students at my school use to get to school?
How do the heights of students in your class vary?
What is a typical score for a hockey team in a single NHL game?
Is each question well formulated for the data cycle? Explain why or why not.
How many years has the Boston Red Sox baseball team been around?
How do the heights of 7th and 8th graders at my school compare?
What is the distribution of ages at the local martial arts studio?
We follow the data cycle to help us formulate questions and use data to answer them.
The questions we ask may lead to data that is numerical data. This data may be left as individual data values or grouped when it is displayed.
Well formulated questions should have more than one possible answer and clearly identify the population we are looking to investigate.
When we have questions, we use different ways to collect data to find answers:
Observation: Watching and noting things as they happen
Measurement: Using tools to find out how much, how long, or how heavy something is
Survey: Asking people questions to get information
Experiment: Doing tests in a controlled way to get data. For example, planting two identical plants, giving one sunlight and the other only artificial light, and observing the differences
Acquire existing secondary data: Use data which was collected by a reliable source like census data, Common Online Data Analysis Platform (CODAP), or National Oceanic and Atmospheric Administration (NOAA) weather data.
We should choose a method that is realistic and ethical. This means making things possible and kind to all participants.
Realistic and ethical | Acquiring secondary data from a reliable source like the SPCA |
---|---|
Not ethical | Stealing one dog, putting it in a shelter and seeing how long it takes to get adopted |
Not realistic | Surveying every animal shelter in the US |
It can be too time consuming to survey the whole population, so we can select a sample, a smaller group from the population. The process of choosing the people or subjects for the sample is called sampling.
The sample should be:
representative of the population, by having the same characteristics
randomly selected
big enough size to give reliable data
A larger sample size is usually better because they make a representative sample more likely, simply by including more of the population. However, a larger sample size can be a lot more expensive, time consuming, and difficult to organize. We need to balance a realistic sample size and reliable results.
A good sample can be used to make reasonable assumptions about the whole population. A sample that is too small or was not selected randomly can lead to an incorrect conclusion.
In previous grades, we have used line graphs, line plots, stem-and-leaf plots, and circle graphs.
For a short exploration of the data cycle, let the population be your class and explore a question which uses numerical data.
Formulate a question that you could easily collect data on.
Describe a realistic process for collecting the data. Would the sample be representative of the population? Explain.
Collect the data.
Would it make sense to leave the individual data values, or to group them?
Represent the data visually.
What does this data tell you about your original question?
Hannah has chosen to collect information using a sample.
What are the advantages of doing a sample? Select all that apply.
What are the disadvantages of doing a sample? Select all that apply.
A middle school principal wants to determine whether students would support adding soccer as a new after school sports team. Anyone who attends the school would be able to join the team.
Identify the target population.
What method would be best to find out how the students feels about the addition of a new sport?
Explain why surveying members of the football team about their preference is not representative of the population.
Donovan wants to explore temperature trends in his hometown over the last 50 years.
Formulate a question to help him complete his investigation.
Could he use observation, measurement, survey, experiment, or acquire secondary sources? Explain.
Explain how Donovan could collect data that could be used to answer his question from part (a).
How do the temperature trends in your hometown compare to Donovan's?
After we formulate a clear question, we use the data cycle to collect, show, and explain information. To get data, we can use methods like:
Watching (Observation)
Measuring
Asking quesions (Survey)
Doing experiments
Acquiring existing secondary data
It's important to choose the right method based on the question we have.
When we collect data from a sample, we need to make sure that it is representative of the population. We can do this making sure the sample is randomly selected, is big enough, and has the same characteristics as the population.