topic badge
AustraliaVIC
VCE 12 General 2023

2.03 Two way tables and graphs

Lesson

Row and column percentages

Two-way tables allow us to display and examine the relationship between two sets of categorical data. The categories are labelled at the top and the left side of the table, and the frequency of the different characteristics appear in the interior of the table. Often the totals of each row and column are also shown.

The following are the statistics of the passengers and crew who sailed on the Titanic on its fateful maiden voyage in 1912.

First ClassSecond ClassThird ClassCrewTotal
Survived202118178212710
Died1231675286961514
Total:3252857069082224

Although it is interesting to know that 202 First Class passengers survived, it is far more useful to know the percentage break up of survivors from each class. To do this we need to calculate row percentages. To find the percentage divide the value in the table by the row total and multiply by 100\%. The row percentages are shown in the table below with working shown in some of the cells.

First ClassSecond ClassThird ClassCrewTotal
Survived\dfrac{202}{710} \times 100\% \approx 28\%\dfrac{118}{710} \times 100\% \approx 17\%25\%30\%100\%
Died\dfrac{123}{1514}\times 100\% \approx 8\%11\%35\%46\%100\%

This percentage frequency table now gives us more useful information than the raw data. The first row gives us the percentage breakdown of survivors by class. We can now easily read that 46\% of the people who died were crew members whereas only 8\% of the people who died were in first class.

We can also calculate the percentage in each class type that survived or died. To do this we calculate column percentages. To find the percentage divide the value in the table by the column total and multiply by 100\%. The column percentages are shown in the table below with working shown in some of the cells.

First ClassSecond ClassThird ClassCrew
Survived\dfrac{202}{325} \times 100\% \approx 62\%\dfrac{118}{285} \times 100\% \approx 41\%25\%23\%
Died\dfrac{123}{325}\times 100\% \approx 38\%59\%75\%77\%
Total100\%100\%100\%100\%

This percentage frequency table now gives us more useful information. The first column gives us the percentage breakdown of survivals and deaths in first class. From the raw data we can see that a similar number of first class (202) and third class (178) passengers survived. However this can be misleading. The percentage frequency table shows us that 62\% of first class passengers survived whereas only 25\% of third class passengers survived.

Examples

Example 1

Glen surveyed all the students in Year 12 at his school and summarised the results in the following table:

Play netballDo not play netballTotal
\text{Height} \geq 170 \text{ cm}3372105
\text{Height} \lt 170 \text{ cm}133043
\text{Total}46102148
a

Which variable is the explanatory variable?

A
Play netball
B
Height
Worked Solution
Create a strategy

Choose a variable on which when changed will affect the other variable.

Apply the idea

The decision of the students to play or not to play may depend on their actual height whereas their height won't change even if they play or they don't play the netball. So the height is the explanatory variable, option B.

b

To examine if there is an association between height and playing netball, should Glen use a column or row percentage frequency table?

Worked Solution
Create a strategy

In a percentage frequency table, we want to sum the percentages of the explanatory variable.

Apply the idea

We found in part (a) that the height is the explanatory variable, which are the rows. To sum the percentage of height we should use a row frequency table, option B.

c

Complete the row percentage frequency table for this data. Round your answers to the nearest percentage.

Play netballDo not play netballTotal
\text{Height} \geq 170 \text{ cm}⬚\%⬚\%⬚\%
\text{Height} \lt 170 \text{ cm}⬚\%⬚\%⬚\%
Worked Solution
Create a strategy

Divide each number of students by the total number of students in that row and multiply the answer by 100\%.

Apply the idea

Here is the original table with the row totals:

Play netballDo not play netballTotal
\text{Height} \geq 170 \text{ cm}3372105
\text{Height} \lt 170 \text{ cm}133043

Here is the table with the calculations for the row percentages:

Play netballDo not play netballTotal
\text{Height} \geq 170 \text{ cm}\dfrac{33}{105}\times 100\% \approx 31\%\dfrac{72}{105}\times 100\% \approx 69\%100\%
\text{Height} \lt 170 \text{ cm}\dfrac{13}{43}\times 100\% \approx 30\%\dfrac{30}{43}\times 100\% \approx 70\%100\%
d

Looking at the columns of the completed table, does there appear to be an association between height and playing netball?

A
No, there does not appear to be any association as the numbers are similar.
B
Yes, there appears to be an association as the numbers are quite different. It seems that taller people like to play netball.
Worked Solution
Create a strategy

In a row percentage table, if the values are similar then we say there is NO clear association.

Apply the idea

The completed row percentage table is shown:

Play netballDo not play netballTotal
\text{Height} \geq 170 \text{ cm}31\%69\%100\%
\text{Height} \lt 170 \text{ cm}30\%70\%100\%

The numbers in each column are quite similar quite similar with just 1\% difference in values. So we can say that there does not appear to be any association between height and playing netball, option A.

Idea summary

To find the row percentage for a particular value, divide the value in the table by the row total and multiply by 100\%.

To find the column percentage for a particular value, divide the value in the table by the column total and multiply by 100\%.

Associations between variables

To find if there is an association between the variables in the Titanic table we can ask the question "Is survival rate dependent on the class of the passenger?"

In order to find this we must first identify the explanatory variable. In this problem it is the class of passenger. The explanatory variable forms the heading of each column, therefore the column percentage frequency table will best indicate any patterns.

First ClassSecond ClassThird ClassCrew
Survived62\%41\%25\%23\%
Died38\%59\%75\%77\%
Total:100\%100\%100\%100\%

When we read across the first row in the column percentage frequency table and look at the numbers we can see a clear difference in the percentages of passengers that lived or died in each class. This suggests there is an association between the class of passenger and the rate of survival. It appears that the higher the class of passenger, the higher the rate of survival.

Which percentage frequency table to use?

If the explanatory variable forms the column headings then we use the column percentage frequency table to look for association.

If the explanatory variable forms the row headings then we use the row percentage frequency table to look for association.

How do we tell if there is an association?

If it's a column percentage table then look across the rows for differences in the values. If the values are similar then we say there is NO clear association.

If it's a row percentage table then look down the columns for differences in the values. If the values are similar then we say there is NO clear association.

How to describe the association?

First state if there is or is not an association apparent. For example: There appears to be an association between the variables.

Next describe the association. For example: The class of passenger affects the survival rate of the passengers.

Finally give an example. For example: The higher the class of passenger, the more likely the passenger was to survive.

An overview of the percentages in two-way tables can bring to light clear associations. The presence of more subtle associations and an objective measure of the significance of such associations requires additional analysis and methods from further studies in statistics.

Note: The term 'association' is used to describe a relationships between variables. An association does not mean one variable causes the other variable to change but that a change in one variable appears to affect the other.

Examples

Example 2

Members of a gym were asked what kind of training they do. Each of them only did one kind of training. The table shows the results:

CardioWeight
Male1126
Female4617
a

Which variable is the explanatory variable?

A
Gender
B
Type of training
Worked Solution
Create a strategy

Choose a variable on which when changed will affect the other variable.

Apply the idea

The decision of the members of a gym to choose what type of training they do may depend on their gender whereas their gender won't change regardless what type of training they choose to do. So the gender is the explanatory variable, Option A.

b

To examine if there is an association between the type of training and gender, should we use a column or row percentage frequency table?

A
Row
B
Column
Worked Solution
Create a strategy

In a percentage frequency table, we want to sum the percentages of the explanatory variable.

Apply the idea

We found in part (a) that the gender is the explanatory variable, which are the rows. To sum the percentage of gender we should use a row frequency table, option A.

c

Complete the row percentage frequency table for this data. Round your answers to the nearest percentage.

CardioWeightTotal
Male⬚\%⬚\%⬚\%
Female⬚\%⬚\%⬚\%
Worked Solution
Create a strategy

Divide each gender by the total number of gym members in that row and multiply the answer by 100\%.

Apply the idea

Here is the original table with the row totals:

CardioWeightTotal
Male112637
Female461763

Here is the table with the calculations for the row percentages:

CardioWeightTotal
Male\dfrac{11}{37}\times 100\% \approx 30\%\dfrac{26}{37}\times 100\% \approx 70\%100\%
Female\dfrac{46}{63}\times 100\% \approx 73\%\dfrac{17}{63}\times 100\% \approx 27\%100\%
d

Looking at the columns of the completed table, does there appear to be an association between the type of training and the gender of gym members?

A
Yes, there appears to be an association as the numbers are quite different. Men seem to prefer weights, while women seem to prefer cardio.
B
No, there does not appear to be any association as the numbers are different.
Worked Solution
Create a strategy

In a row percentage table, if the values are not similar then we say that there is clear association between the variables.

Apply the idea

The completed row percentage table is shown:

CardioWeightTotal
Male 30\%70\%100\%
Female73\% 27\%100\%

The numbers in each column have large difference in values. Based on the table, men seem to prefer weights, while women seem to prefer cardio. So we can say that there appears to be any association between gender and the type of training, option A.

e

Does a person’s gender cause them to choose a certain type of training?

A
Yes. As we saw, women prefer to do cardio and men prefer to do weights.
B
No, association is not causation. There appears to be an association but we cannot say whether one variable causes the other.
Worked Solution
Create a strategy

Use the answers found in part (c) and (d) to determine whether a gender of a person causes them to choose a certain type of training.

Apply the idea

Although the table on part (c) shows that men seem to prefer weights while women seem to prefer cardio, we cannot tell that these statements are true since we cannot say whether one variable causes the other. So, the correct answer is Option B.

Idea summary
  • If the explanatory variable forms the column headings then we use the column percentage frequency table to look for association.

    • Then look across the rows for differences in the values. If the values are similar then we say there is NO clear association.

  • If the explanatory variable forms the row headings then we use the row percentage frequency table to look for association.

    • Then look down the columns for differences in the values. If the values are similar then we say there is NO clear association.

To describe the association:

  1. State whether there is or is not an association apparent.

  2. Describe the association. e.g. The class of passenger affects the survival rate of the passengers.

  3. Give an example. e.g. The higher the class of passenger, the more likely the passenger was to survive.

100 % stacked column graphs

Association between variables can often be seen more clearly in a stacked column graph. Below is a stacked column graph (also called segmented column graph) for the data from the Titanic table earlier. When we look at each column we can see the proportion of death to survival in each column is different. This indicates there is an association between the variables.

A stacked column graph showing death and survival rates in each class on the titanic. Ask your teacher for more information.

If there is no association then the proportion of the sections in each column are the same. When we look at the graph below we can see that each column is divided into similar size sections. This indicates there is NO clear association between household composition and distribution of money.

A stacked column graph which shows the distribution of money in each household. Ask your teacher for more information.

How to draw a stacked column graph?

  • Label the horizontal axis with the explanatory variables.

  • Label the vertical axis with percentages from 0\% to 100\%.

  • Draw a column for each explanatory variable that reaches the height of 100\% on the vertical axis.

  • To divide each column into the percentages as shown in the frequency table start from the bottom of the column, count to the first percentage and draw a horizontal line to mark it off, then count up to the second percentage from the horizontal line and then mark off again, until all sections are complete

  • Write the label and percentage in each section of the columns which indicates the response variables displayed or provide a key.

Examples

Example 3

A group of year 12 students surveyed their class and recorded the hair colour and eye colour for each student. The results are displayed in the 100\% stacked column chart shown below.

A stacked column graph for hair colour for various eye colours. Ask your teacher for more information.
a

What is the explanatory variable for this chart?

A
Eye colour
B
Hair colour
Worked Solution
Create a strategy

In a stacked column graph, the explanatory variable is placed along the horizontal axis.

Apply the idea

The stacked column graph has the eye colour placed along the horizontal axis. So it is the explanatory variable, option A.

b

Does the chart suggest an association between eye colour and hair colour?

A
Yes, as the corresponding segments are similar in size.
B
Yes, as the corresponding segments are of different sizes.
C
No, as the corresponding segments are similar in size.
D
No, as the corresponding segments are of different sizes.
Worked Solution
Create a strategy

Check whether the proportion of the segments in each column are similar sizes.

Apply the idea

The stacked column graph segments differ in sizes which means that it does suggests an association between eye colour and hair colour, option B.

c

Can we say that having blue eyes causes a high chance of having blonde hair?

A
Yes. The data shows that students with blue eyes are more likely to have blonde hair.
B
No. There appears to be an association, but we cannot say that one causes the other.
Worked Solution
Create a strategy

Consider the difference between association and causation.

Apply the idea

An association does not mean one variable causes the other variable to change but that a change in one variable appears to affect the other.

The chance of having blonde hair can be caused by many factors. Having blue eyes may affect the chance of having one but we cannot say that it causes such. So the correct answer is option B.

Idea summary

How to draw a stacked column graph?

  • Label the horizontal axis with the explanatory variables.

  • Label the vertical axis with percentages from 0\% to 100\%.

  • Draw a column for each explanatory variable that reaches the height of 100\% on the vertical axis.

  • To divide each column into the percentages, start from the bottom of the column, count to the first percentage and draw a horizontal line to mark it off, then count up to the second percentage from the horizontal line and then mark off again, until all sections are complete

  • Write the label and percentage in each section of the columns which indicates the response variables displayed or provide a key.

Outcomes

U3.AoS1.19

construct two-way tables and use them to identify and describe associations between two categorical variables

What is Mathspace

About Mathspace