7. Statistics

Lesson

One common way of collecting data is through a survey. Conducting a survey involves choosing a question to ask and then recording the answer. This is great for collecting information, but at the end we are left with a long list of answers that can be difficult to interpret.

This is where tables come in. We can use various tables to organize our data so that we can interpret it at a glance.

When conducting a survey, the three main steps are:

- Gathering the data
- Organizing the data
- Interpreting the data

We have looked at what questions we should ask when gathering different types of data. Now we are going to look at how tables can be used to help us organize and interpret data.

Suppose that Melanie wanted to find the least common color of car in her neighborhood. To help her find an answer to this, she conducted a survey by observing the colors of the cars passing through her street.

By sitting in front of her house and recording the color of the first $20$20 cars that drove past, Melanie obtained the following data:

white, black, white, black, black, blue, blue, white, red, white,

white, blue, orange, blue, white, white, orange, red, blue, red

In order to better interpret her data, Melanie converted this list of colors into a frequency table, counting the number of cars corresponding to a particular color and writing that number in the frequency column next to that color.

Car color | Frequency |
---|---|

Red | $3$3 |

Black | $3$3 |

White | $7$7 |

Orange | $2$2 |

Blue | $5$5 |

Looking at her table, Melanie found that orange was the least common car color in her neighborhood as it had the least frequency.

Frequency

The frequency of a result is the number of times that it appears in the list of data.

Melanie has answered her initial question, but she realizes she can use the same data to answer other questions about the colors of cars in her neighborhood.

a) What fraction of the cars were black?

We can read from the table that $3$3 cars were black. Since Melanie recorded the color of $20$20 cars, this means that $3$3 out of $20$20 of the cars were black. We can express this as the fraction $\frac{3}{20}$320.

b) What was the most common color of car?

Looking at the table, we can see that the result with the greatest frequency is the color white, so this was the most common color. This means that the mode of the data is "white".

Mode

The mode of a data set is the result with the greatest frequency.

If there are multiple results that share the greatest frequency then there will be more than one mode.

When representing the frequency of different results in our data, we often choose to use a frequency table.

Frequency table

A frequency table communicates the frequency of each result from a set of data. Typically the far left column describes the result or data value and any columns to the right represent frequencies or how many times a result occurred.

As seen in the example above, frequency tables can help us find the least or most common results among categorical data. They can also allow us to calculate what fraction of the data a certain result represents.

When working with numerical data, frequency tables can also help us to answer other questions that we might have about how the data are distributed.

Thomas conducted a survey on the average number of hours his classmates exercised per day and displayed his data in the table below.

No. exercise hours | Frequency |
---|---|

$0$0 | $2$2 |

$1$1 | $12$12 |

$2$2 | $7$7 |

$3$3 | $5$5 |

$4$4 | $0$0 |

$5$5 | $3$3 |

How many classmates did Thomas survey?

What is the mode of the data?

How many classmates exercised for less than three hours?

How many classmates exercised for at least three hours?

In the practice question above, Thomas found that there were no classmates who exercised for $4$4 hours. Instead of leaving the frequency blank, Thomas put $0$0 as the frequency. If he had left this information out of the table then we would not know how many classmates fit this category.

To calculate the total number of data points we can add up all the frequencies. Then to calculate the total for "less than" some number, we add up the frequencies for the results that are less than that number. Similarly, when calculating the total for "at least" some number, we add up the frequencies that are greater than or equal to that number.

When the data are more spread out, sometimes it doesn't make sense to record the frequency for each separate result and instead we group results together to get a grouped frequency table.

Grouped frequency table

A grouped frequency table combines multiple results into a single group. We can find the frequency of a group by adding all the frequencies of the results contained in that group.

A teacher wants to express the heights (in cm) of her students in a table using the following data points:

$189,154,146,162,165,156,192,175,167,174$189,154,146,162,165,156,192,175,167,174

$161,153,184,177,155,192,169,166,148,170$161,153,184,177,155,192,169,166,148,170

$168,151,186,152,195,169,143,164,170,177$168,151,186,152,195,169,143,164,170,177

She realizes that if each result has its own frequency then the table would have too many rows, so instead she grouped the results into sets of $10$10 cm. As a result, her grouped frequency table looked like this:

Height (cm) | Frequency |
---|---|

$140-149$140−149 | |

$150-159$150−159 | |

$160-169$160−169 | |

$170-179$170−179 | |

$180-189$180−189 | |

$190-199$190−199 |

Complete the frequency table above.

To fill in the frequency for each group, we can count the number of results that fall into the range of each group.

For example, the group $150-159$150−159 will include the results:

$154,156,153,155,151,152$154,156,153,155,151,152

Since there are $6$6 results that fall into the range of this group, this group has a frequency of $6$6.

Using this method, we can fill in the rest of the grouped frequency table to get:

Height (cm) | Frequency |
---|---|

$140-149$140−149 | $3$3 |

$150-159$150−159 | $6$6 |

$160-169$160−169 | $9$9 |

$170-179$170−179 | $6$6 |

$180-189$180−189 | $3$3 |

$190-199$190−199 | $3$3 |

Looking at the table, we can see that the modal class is the group $160-169$160−169, since it has the greatest frequency.

By adding the frequencies in the bottom two rows we can also see that $6$6 students were at least $180$180 cm tall. There are $30$30 students in the class in total, so we now know that $\frac{6}{30}$630 of her students, or one fifth of the class, are taller than $180$180 cm.

Modal class

The modal class in a grouped frequency table is the group that has the greatest frequency.

If there are multiple groups that share the greatest frequency then there will be more than one modal class.

As we can see, grouped frequency tables are useful when the data are more spread out. While the teacher could have obtained the same information from a normal frequency table, the grouping of the results condensed the data into an easier to interpret form.

However, the drawback of a grouped frequency table is that the data becomes less precise, since we have grouped multiple data points together rather than looking at them individually.

Yvonne asks $15$15 of her friends what their favorite color is. She writes down their answer. Here is what she wrote down:

blue, pink, blue, yellow, green, pink, pink, yellow, green, blue, yellow, pink, yellow, pink, pink

Count the number of each color and fill in the table.

Color Number of Friends pink $\editable{}$ green $\editable{}$ blue $\editable{}$ yellow $\editable{}$ Which color is the mode?

pink

Ayellow

Bgreen

Cblue

Dpink

Ayellow

Bgreen

Cblue

D

A survey of $30$30 people asked them how many video games they had played in the past month. Select true or false for each of the following statements:

Number of video games played | Frequency |
---|---|

$0$0$-$−$4$4 | $5$5 |

$5$5$-$−$9$9 | $12$12 |

$10$10$-$−$14$14 | $9$9 |

$15$15$-$−$19$19 | $4$4 |

"We know that $25$25 people played $10$10 or more video games."

True

AFalse

BTrue

AFalse

B"We know that $17$17 people played $7$7 or fewer video games."

True

AFalse

BTrue

AFalse

B"$21$21 people played more than $4$4 but less than $15$15 video games."

True

AFalse

BTrue

AFalse

B"The modal class was $5$5-$9$9 video games."

True

AFalse

BTrue

AFalse

B

Summarize numerical data sets in relation to their context.

Report the number of observations.

Describe the nature of the attribute under investigation, including how it was measured and its units of measurement.