topic badge

8.01 Two way frequency tables

Introduction

In 8th grade, we saw two-way frequency tables and relative frequency tables that helped us determine associations in bivariate categorical data. We will extend that practice here and consider joint and marginal frequencies as we analyze data.

Two-way frequency tables

Exploration

Consider the data from a survey:

Students were asked whether they prefer high-top sneakers or low-top sneakers, and they were also asked whether they prefer to shop online or shop in a store. Some of the results from the survey are:

  • The survey included 75 students
  • 30 students said they prefer high-top sneakers more than low-top sneakers
  • 35 students said they prefer to shop online more than in a store
  • 7 students said they prefer to shop in a store and high-top sneakers
  • 33 students said they prefer low-top sneakers and shopping in a store
  1. How many students prefer low-top sneakers more than high-top sneakers?
  2. How many students prefer low-top sneakers and shopping online?
  3. Is there an easier way to display the data in order to answer the first two questions?

A two-way frequency table, or joint frequency table, can be used to summarize bivariate categorical data. It is a data display that shows how many data values fit into multiple categories.

Bivariate categorical data

Data that measures two characteristics of a population

Marginal frequency

How often a particular category of data occurred, found in the "total" row and column of a two-way frequency table

Joint frequency

In a two-way table, joint frequency is the number of times a combination of two conditions occurs

A two-way frequency table with no title but with 4 columns and 5 rows. Titles from the second column are Child, Adult, and Total. First row has no title. Titles starting from the second row are Drawing, Painting, Graphics, and Total. The data is as follows: Drawing: Child, 22, Adult, 8, Total, 30; Painting: Child, 18, Adult, 7, Total, 25; Graphics: Child, 9, Adult, 36, Total, 45; Total: Child, 49, Adult, 51, Total, 100. On the right side of the table is an arrow pointing down that is labelled Add down. At the bottom is an arrow pointing to the right labelled add across. The numbers in the Column and Row for Total except for the last number are labelled as Marginal Frequencies. The numbers inside the boxes of combined conditions are labelled Joint frequencies.

Here we have information about the two categories: type of art and age. If read across each row, we can tell how many people surveyed prefer drawing, painting, or graphics and are children or adults. If we read down each column, we can tell how many children or adults surveyed prefer drawing, painting, or graphics.

Examples

Example 1

The two-way frequency table displays the number of people at Chili Fest with which kind of chili they bought and whether they added extra spice or not.

Added spiceDid not add spiceTotal
Meat331401732
Vegetarian12543168
Total456444
a

Describe the group of people which has exactly 401 people in it.

Worked Solution
Create a strategy

We should find 401 in the table and look at the row and column headings to identify what the number represents.

Apply the idea

The number 401 lies in the row for people who chose meat and in the column for people who did not add spice. This means that there were 401 people who chose meat chili and did not add spice.

b

Identify whether the number of people who chose vegetarian chili is a marginal or joint frequency and state its value.

Worked Solution
Create a strategy

It is not specified whether this group added extra spice or not, so we are looking at all people who chose the vegetarian chili.

Apply the idea

Since we are looking at the total number of people who chose vegetarian chili, this is a marginal frequency.

There are 168 people who chose the vegetarian chili.

c

Calculate the total number of people who ate chili at Chili Fest.

Worked Solution
Create a strategy

We can find this total in a variety of ways:

  • Add all the joint frequencies
  • Add the marginal frequencies for the type of chili
  • Add the marginal frequencies whether they added spice or not
Apply the idea

Let's add the marginal frequencies for the type of chili.

\text{Meat}+\text{Vegetarian}=732+168=900

There were 900 people who ate chili at Chili Fest.

Reflect and check

We can check this by adding up all of the joint frequencies.

331+401+125+43=900

Example 2

A scientist recorded some data on the flowering pattern and type of soil for a variety of coreopsis plants:

  • 1000 plants were observed
  • 136 plants did not flower
  • 394 plants that did flower were planted in peat soil
  • 30 plants that were planted in sandy soil did not flower
a

Organize the data into a two-way frequency table based on the given information about coreopsis plants.

Worked Solution
Create a strategy

First we should decide what our table's row and column headings should be. The two characteristics we are considering are whether or not the plants flowered and what type of soil they were planted in.

Then we can fill in the given information. Use the fact that the total number of all plants is 1000 and that the marginal frequencies are the sums of the rows and columns to calculate the missing frequencies.

Apply the idea
FloweredDid not flowerTotal
Peat soil394
Sandy soil30
Total1361000

Based on the information we have been given, we can already determine:

  • Marginal frequency for number that flowered: 1000-136=864
  • Joint frequency of peat soil/did not flower: 136-30=106
FloweredDid not flowerTotal
Peat soil394106
Sandy soil30
Total8641361000

Based on the new calculated information, we can determine:

  • Marginal frequency for number planted in peat soil: 394+106=500
  • Then the marginal frequency for number planted in sandy soil: 1000-500=500
  • Joint frequency for plants that flowered in sandy soil: 864-394=470
FloweredDid not flowerTotal
Peat soil394106500
Sandy soil47030500
Total8641361000
Reflect and check

We can always check by adding across each row and down each column to ensure the marginal frequency is the sum of the joint frequencies.

b

Compare the number of plants that flowered in peat soil to the number that flowered in sandy soil. Explain any conclusions the scientist might make.

Worked Solution
Apply the idea

Notice that there were 500 plants planted in peat soil and 500 plants planted in sandy soil.

However, there were not the same number of plants that flowered in each type of soil. There were more plants that flowered when planted in sandy soil than in peat soil. We can see this because 470 >394.

The scientists might conclude that coreopsis plants are more likely to flower when planted in sandy soil than when planted in peat soil.

Reflect and check

Note that we do not know under what conditions this data was collected. We do not know if this was set up as an experiment to control other variables, such as sunlight and water. Based on the minimal information given, we cannot say "Being planted in sandy soil causes the plants to flower more."

c

Marvis said because there were a total of 1000 plants observed and 30 plants that were planted in sandy soil did not flower, this means 1000-30=970 were planted in peat soil and did flower. Identify and correct his error.

Worked Solution
Create a strategy

Review the completed two-way table to determine whether the statement is reasonable.

Apply the idea

Marvis seems to have a misconception around how joint frequencies relate to one another. He has stated that \text{sandy and flowered}+\text{peat and not flowered}=\text{Total} while it should be\text{sandy and flowered}+\text{sandy and not flowered}+\text{peat and flowered}+\text{peat and not flowered}=\\ \text{Total}

We need to consider all possible cases, not just opposite cases.

He should have said that because there were a total of 1000 plants observed and 30 plants planted in sandy soil did not flower, this means 1000-30=970 represents the plants that were planted in the peat soil plus the plants that were planted in the sandy soil and flowered.

Idea summary

Types of frequencies to consider in a two-way table are:

  • Marginal frequency: How often a particular category of data occurred (or the total row and column data)
  • Joint frequency: How often a combination of two conditions occurred

Outcomes

S.ID.B.5

Summarize categorical data for two categories in two-way frequency tables. Interpret relative frequencies in the context of the data (including joint, marginal, and conditional relative frequencies). Recognize possible associations and trends in the data.

What is Mathspace

About Mathspace