topic badge

9.01 Formulate questions and collect data

Formulate questions

The data cycle is the process where we formulate questions, then collect, display, and explain mathematical data.

A data cycle with four stages. At the top, there is Formulate questions represented by a speech bubble with a question mark. To the right, Collect or acquire data is shown with an icon of a person and a magnifying glass. At the bottom, Organize and represent data is illustrated with a dot plot. To the left, Analyze and communicate results is indicated by a person with charts. Clockwise arrows are drawn from one stage to the next.

To help us formulate or write our question, we can think about whether we will get categorical data or numerical data.

Categorical data

Data that can be put in categories and may not have a specified order.

It can be displayed in pictographs, line plots, and bar graphs.

Example:

Colors or types of food

A column graph titled Fruit for Lunch. The y axis labeled Number with a scale from 0 to 10. Four types of fruit are listed on the x axis: Banana, Apple, Mandarin and Pineapple.  The graph shows bars of varying heights representing the number of each fruit: Banana has a count of 2, Apple 3, Mandarin is the highest at around 9, and Pineapple 1.

This column graph shows the fruits that students have in their lunch one day.

Notice that we have four categories Banana, Apple, Mandarin, and Pineapple and the graph helps us count how many items fall in each category.

Discrete numerical data

Data that can only take certain values and has a limited range of values.

It can be displayed in line plots, step-and-leaf-plots, and line graphs.

Example:

Shoe size or number of siblings

A dot plot on a number line. The number line is labeled with mixed numbers at every half unit from 5 and a half to 8. The plot includes the following data points: two dots above 5 and a half, three dots above 6, four dots above 6 and a half, one dot above 7, and two dots above 7 and a half. There are no dots above whole numbers 5, 7, or 8.

This line plot shows the shoe sizes of a group of students.

Notice that responses are restricted to possible shoe sizes, which are discrete numerical. We can see how many people wear each size from the line plot.

A clear question helps us know what kind of data to gather and who to collect it from. The type of question we ask can lead us to collect different data.

The group we are hoping to answer the question about is called the population.

When we write a question, it should be about the population we want to learn about and have more than one possible answer.

Non-examples of questionsExamples of well formulated questions
How old is my neighbor?What ages are the people in my neighborhood?
What brand is my computer?What is the most popular brand of computer among my classmates?
What is your favorite color?What colors are preferred by students in my neighborhood?

It needs to be clear which attributes we are exploring with our question. An attribute is a specific characteristic or feature of a given subject.

For example, if we want to learn more about pets in our class, we need to be clear which attribute we are interested in. These could include:

  • Number of pets

  • Type of pets

  • Size of pets

  • Age of pets

Examples

Example 1

All the students in your school take a survey with these questions.

a

Which question(s) will have discrete numerical data as their results?

A
How many pets do 6th graders have?
B
In your class, how many bones has each student broken?
C
How long does it take students at my school to get to school everyday?
D
What kinds of pets are owned by my classmates?
Worked Solution
Create a strategy

Consider the answers to these questions. If the answers are numbers, would they be found from counting (discrete), or would they be found from measuring (continuous)?

Apply the idea

Option A requires counting the number of pets.

Option B requires counting the number of times they broken bones.

Option C requires measuring time.

Option D would be word answers.

So options A and B would have discrete numerical data as their results.

Reflect and check

For option C, if the question asked "to the nearest 5 minutes", it could be discrete.

b

Which question(s) will have categorical data as their results?

A
How many cousins do my classmates have?
B
What country was your most recent vacation in?
C
How many people are in the average residence in Virginia?
D
How tall are 6th graders?
E
What middle school grade level has the most students?
Worked Solution
Create a strategy

Choose a question where the responses of each person would be a category.

Apply the idea

Option A requires each student to state or count the number of cousins they have.

Option B answers would be a country name.

Option C would be require the counting of the number of people in each residence.

Option D would require measuring a numerical height.

Option E would require each student to state their grade level. Although grades can be written as numbers, they are really categories. For example, in high school, grade 9 could also be called Freshman which is not numerical.

So options B and E would have categorical data as their results.

Reflect and check

For Option B, if a very large number of countries were reported, the data could be analyzed by further categorizing into continents or another classification.

Example 2

Is each question well formulated for the data cycle? Explain why or why not.

a

Who was the first president?

Worked Solution
Create a strategy

A well formulated question should have a variety of possible answers and relate to a specific population.

Apply the idea

There is one answer and no clear population, so this is not a well formulated question for the data cycle.

Reflect and check

A related question we could use the data cycle for is "What is the most common term length for a US president?"

b

How do the shoe sizes of 5th and 6th graders at my school compare?

Worked Solution
Create a strategy

A well formulated question should have a variety of possible answers and relate to a specific population.

Apply the idea

This is a well formulated question as shoe size is a clear attribute with different possible answers.

Reflect and check

This data would be discrete numerical.

c

How much money do professional athletes in the US make?

Worked Solution
Create a strategy

There is a clear population of professional athletes in the US. Now we need to check if data could be collected to give a variety of answers.

Apply the idea

Different athletes will make different amounts, so this something we could find data on. This is a well formulated question.

Idea summary

We use the data cycle to formulate questions, then collect, show, and explain information. Depending on the question being asked, the data may be categorical data or numerical data.

A well formulated question should have more than one possible answer and relate to a population.

Collect data

When we have questions, we use different ways to collect data to find answers:

  • Observation: Watching and noting things as they happen.

    • For example, watching birds at a feeder to see which type comes most often.

  • Measurement: Using tools to find out how much, how long, or how heavy something is.

    • For example, using a ruler to measure the growth of a plant over several weeks.

  • Survey: Asking people questions to get information.

    • For example, asking classmates about their favorite school subject and recording the answers.

  • Experiment: Doing tests in a controlled way to get data.

    • For example, planting two identical plants, giving one sunlight and the other only artificial light, and observing the differences.
  • Acquire existing secondary data: Use data which was collected by a reliable source like census data, Common Online Data Analysis Platform (CODAP), or peer reviewed studies.

To help us choose a method, we need to be sure that it is realistic based on our sample. For example, it might be too time consuming to do an experiment in an hour, so we can acquire existing data instead.

Our chosen method must also be ethical. This means that no one gets hurt, asked inappropriate questions, or experimented on without consent.

When doing a survey or using secondary sources, it is important that the data is collected from a sample that is representative of the population, so that our analysis of the data is valid.

Representative means that characteristics of the population should be similar to the sample.

A concept of sampling from a population. On the left is a large circle labeled Population containing many diverse cartoon faces representing individuals. On the right is a smaller circle labeled Sample with a subset of the faces from the population, connected by an arrow indicating selection from the larger group to the smaller.

For example, if we are collecting data to answer the question "What is the most popular restaurant in my city?" and only surveyed people at our favorite Mexican restaurant, then this sample would not include people who prefer other types of food.

In general the larger our sample is, the more likely it is to be a good representation of the population.

Exploration

In previous grades, we have used pictographs, bar graphs, line graphs, line plots, and stem-and-leaf plots.

For a short exploration of the data cycle, let the population be your class.

  1. Formulate a question that you could easily collect data on.

  2. What type of data would be collected: categorical or discrete numerical?

  3. Describe a realistic process for collecting the data.

  4. Collect the data.

  5. Represent the data visually.

  6. What does this data tell you about your original question?

Examples

Example 3

Aditya wants to investigate the social media habits of the students in her grade.

a

Formulate a question to help her complete this investigation.

Worked Solution
Create a strategy

She could choose to explore which platforms are used, how much they are used, reasons for usage, and the impact on their academic and social lives.

Apply the idea

A potential question could be, "How many hours per week do students in my grade spend on social media?"

Reflect and check

The question can be answered by collecting data, allows for a variety of answers, and could be organized in a data display like a line plot answers are rounded to the nearest hour, so is a well formulated question.

b

What attributes would you need to measure to answer the question?

Worked Solution
Create a strategy

This question from part (a) is well written as it clearly states the attribute.

Apply the idea

For each person, she would need to identify the time they spend across all social media platforms per day.

Reflect and check

When collecting the data, Aditya could collect the data rounded to the nearest hour to make it discrete or could leave it open-ended for more options when displaying or summarizing data.

c

Should she use observation, measurement, survey, or experiment to collect the data? Explain.

Worked Solution
Create a strategy

She should choose a method that is practical, ethical, and will give reliable results.

Apply the idea

A survey would be the most appropriate method for this investigation.

Reflect and check

The population is relatively small, so it would be possible to ask them all a single question about their social media use.

Observing her classmates all day to make conclusions about their social media use would likely not be ethical or easy to do.

Example 4

Collect data that can be used to answer the question "How many first cousins do students in my school have?"

Worked Solution
Create a strategy

This situation would require a survey. Depending on the size of your school, it could be done with a sample or you could survey the whole population.

It is not possible to do observation, measurement, or an experiment for this question.

Apply the idea

We gave five surveys to each teacher in the school and asked them to randomly select five students from their class to do them. The results were recorded in this table.

Student 1Student 2Student 3Student 4Student 5
Class 6A133916
Class 6B2382123
Class 7A55111215
Class 7B134632
Class 8A4591214
Class 8B68101522
Reflect and check

Enough students were surveyed for the results to be representive of the population. However, if possible, more students would give more accurate results.

If she was trying to look at this information for the whole state or beyond, a survey would not be possible and she would need to use a reliable secondary data source. A single data source without supporting references would not be reliable.

Example 5

Georgia wants to know about the current employment of Americans. She randomly selects 10 adults who came to pick up or drop off students from her 6th grade class to survey.

Determine the factors that could mean that the data collected is not representative of the population.

A
The sample won't include a variety of ages of all those who are in the workforce.
B
The adults in the sample were not randomly selected.
C
The survey question is open ended.
D
The sample size is too small for such a big population.
Worked Solution
Create a strategy

The people who are surveyed should have the same characteristics of the population of interest, which in this case is all Americans.

Apply the idea

For option A, this sample likely wouldn't include any teenagers or other groups that are in the workforce, so this is a factor will lead to unrepresentative data.

In option B, the statement that the adults in the sample were not randomly selected is incorrect. Georgia did randomly select adults, they were just randomly selected from a non-representative subgroup.

In option C, the survey question style does not directly relate to the representativeness of the data regarding employment.

In option D, we see the main issue: the sample size of 10 is too small to be representative of such a large and diverse population as "Americans."

The correct answers are option A and D.

Reflect and check

This question might be better answered using secondary data like census data or other non-profit research as collecting data from a representative sample would take a very long time.

Example 6

A sample of 25 people is drawn from a population. In this sample, the youngest is 18 years old, and the oldest is 64.

Which two of these might be populations that this sample was drawn from?

A
Residents of a retirement home
B
Employees at a bank
C
Students from a elementary school
D
Drivers stuck in traffic during rush hour
Worked Solution
Create a strategy

Remember that a sample is a smaller group than the entire population but every member of the sample must be a member of the overall population.

Apply the idea

Which of these populations would be likely to include people of this age group?

The answers are options B Employees at a bank and D People stuck in traffic due to roadworks

Reflect and check

To check our answer we can confirm that the other options are incorrect. Option A, residents of the retirement home must be of at least a retirement age (usually 65). While for Option C, the students attending a elementary school will only range from about ages 5 to 10

Idea summary

After formulating a clear question, we use the data cycle to collect, show, and explain information. To get data, we can use methods like:

  • Watching (Observation)

  • Measuring

  • Asking questions (Survey)

  • Doing experiments

  • Acquiring existing secondary data

It's important to choose the right method based on the question we have.

We need to ensure that whichever method we use, that we collect data from a sample that is representative of the population.

Outcomes

6.PS.1

The student will apply the data cycle (formulate questions; collect or acquire data; organize and represent data; and analyze data and communicate results) with a focus on circle graphs.

6.PS.1a

Formulate questions that require the collection or acquisition of data with a focus on circle graphs.

6.PS.1b

Determine the data needed to answer a formulated question and collect the data (or acquire existing data) using various methods (e.g., observations, measurement, surveys, experiments).

6.PS.1c

Determine the factors that will ensure that the data collected is a sample that is representative of a larger population.

What is Mathspace

About Mathspace