Two-way tables allow us to display and examine the relationship between two sets of categorical data. The categories are labelled at the top and the left side of the table, and the frequency of the different characteristics appear in the interior of the table. Often the totals of each row and column are also shown.
The following are the statistics of the passengers and crew who sailed on the Titanic on its fateful maiden voyage in 1912.
First Class | Second Class | Third Class | Crew | Total | |
---|---|---|---|---|---|
Survived | 202 | 118 | 178 | 212 | 710 |
Died | 123 | 167 | 528 | 696 | 1514 |
Total: | 325 | 285 | 706 | 908 | 2224 |
Although it is interesting to know that 202 First Class passengers survived, it is far more useful to know the percentage break up of survivors from each class. To do this we need to calculate row percentages. To find the percentage divide the value in the table by the row total and multiply by 100\%. The row percentages are shown in the table below with working shown in some of the cells.
First Class | Second Class | Third Class | Crew | Total | |
---|---|---|---|---|---|
Survived | \dfrac{202}{710} \times 100\% \approx 28\% | \dfrac{118}{710} \times 100\% \approx 17\% | 25\% | 30\% | 100\% |
Died | \dfrac{123}{1514}\times 100\% \approx 8\% | 11\% | 35\% | 46\% | 100\% |
This percentage frequency table now gives us more useful information than the raw data. The first row gives us the percentage breakdown of survivors by class. We can now easily read that 46\% of the people who died were crew members whereas only 8\% of the people who died were in first class.
We can also calculate the percentage in each class type that survived or died. To do this we calculate column percentages. To find the percentage divide the value in the table by the column total and multiply by 100\%. The column percentages are shown in the table below with working shown in some of the cells.
First Class | Second Class | Third Class | Crew | |
---|---|---|---|---|
Survived | \dfrac{202}{325} \times 100\% \approx 62\% | \dfrac{118}{285} \times 100\% \approx 41\% | 25\% | 23\% |
Died | \dfrac{123}{325}\times 100\% \approx 38\% | 59\% | 75\% | 77\% |
Total | 100\% | 100\% | 100\% | 100\% |
This percentage frequency table now gives us more useful information. The first column gives us the percentage breakdown of survivals and deaths in first class. From the raw data we can see that a similar number of first class (202) and third class (178) passengers survived. However this can be misleading. The percentage frequency table shows us that 62\% of first class passengers survived whereas only 25\% of third class passengers survived.
Glen surveyed all the students in Year 12 at his school and summarised the results in the following table:
Play netball | Do not play netball | Total | |
---|---|---|---|
\text{Height} \geq 170 \text{ cm} | 33 | 72 | 105 |
\text{Height} \lt 170 \text{ cm} | 13 | 30 | 43 |
\text{Total} | 46 | 102 | 148 |
Which variable is the explanatory variable?
To examine if there is an association between height and playing netball, should Glen use a column or row percentage frequency table?
Complete the row percentage frequency table for this data. Round your answers to the nearest percentage.
Play netball | Do not play netball | Total | |
---|---|---|---|
\text{Height} \geq 170 \text{ cm} | ⬚\% | ⬚\% | ⬚\% |
\text{Height} \lt 170 \text{ cm} | ⬚\% | ⬚\% | ⬚\% |
Looking at the columns of the completed table, does there appear to be an association between height and playing netball?
To find the row percentage for a particular value, divide the value in the table by the row total and multiply by 100\%.
To find the column percentage for a particular value, divide the value in the table by the column total and multiply by 100\%.
To find if there is an association between the variables in the Titanic table we can ask the question "Is survival rate dependent on the class of the passenger?"
In order to find this we must first identify the explanatory variable. In this problem it is the class of passenger. The explanatory variable forms the heading of each column, therefore the column percentage frequency table will best indicate any patterns.
First Class | Second Class | Third Class | Crew | |
---|---|---|---|---|
Survived | 62\% | 41\% | 25\% | 23\% |
Died | 38\% | 59\% | 75\% | 77\% |
Total: | 100\% | 100\% | 100\% | 100\% |
When we read across the first row in the column percentage frequency table and look at the numbers we can see a clear difference in the percentages of passengers that lived or died in each class. This suggests there is an association between the class of passenger and the rate of survival. It appears that the higher the class of passenger, the higher the rate of survival.
Which percentage frequency table to use?
If the explanatory variable forms the column headings then we use the column percentage frequency table to look for association.
If the explanatory variable forms the row headings then we use the row percentage frequency table to look for association.
How do we tell if there is an association?
If it's a column percentage table then look across the rows for differences in the values. If the values are similar then we say there is NO clear association.
If it's a row percentage table then look down the columns for differences in the values. If the values are similar then we say there is NO clear association.
How to describe the association?
First state if there is or is not an association apparent. For example: There appears to be an association between the variables.
Next describe the association. For example: The class of passenger affects the survival rate of the passengers.
Finally give an example. For example: The higher the class of passenger, the more likely the passenger was to survive.
An overview of the percentages in two-way tables can bring to light clear associations. The presence of more subtle associations and an objective measure of the significance of such associations requires additional analysis and methods from further studies in statistics.
Note: The term 'association' is used to describe a relationships between variables. An association does not mean one variable causes the other variable to change but that a change in one variable appears to affect the other.
Members of a gym were asked what kind of training they do. Each of them only did one kind of training. The table shows the results:
Cardio | Weight | |
---|---|---|
Male | 11 | 26 |
Female | 46 | 17 |
Which variable is the explanatory variable?
To examine if there is an association between the type of training and gender, should we use a column or row percentage frequency table?
Complete the row percentage frequency table for this data. Round your answers to the nearest percentage.
Cardio | Weight | Total | |
---|---|---|---|
Male | ⬚\% | ⬚\% | ⬚\% |
Female | ⬚\% | ⬚\% | ⬚\% |
Looking at the columns of the completed table, does there appear to be an association between the type of training and the gender of gym members?
Does a person’s gender cause them to choose a certain type of training?
If the explanatory variable forms the column headings then we use the column percentage frequency table to look for association.
Then look across the rows for differences in the values. If the values are similar then we say there is NO clear association.
If the explanatory variable forms the row headings then we use the row percentage frequency table to look for association.
Then look down the columns for differences in the values. If the values are similar then we say there is NO clear association.
To describe the association:
State whether there is or is not an association apparent.
Describe the association. e.g. The class of passenger affects the survival rate of the passengers.
Give an example. e.g. The higher the class of passenger, the more likely the passenger was to survive.
Association between variables can often be seen more clearly in a stacked column graph. Below is a stacked column graph (also called segmented column graph) for the data from the Titanic table earlier. When we look at each column we can see the proportion of death to survival in each column is different. This indicates there is an association between the variables.
If there is no association then the proportion of the sections in each column are the same. When we look at the graph below we can see that each column is divided into similar size sections. This indicates there is NO clear association between household composition and distribution of money.
How to draw a stacked column graph?
Label the horizontal axis with the explanatory variables.
Label the vertical axis with percentages from 0\% to 100\%.
Draw a column for each explanatory variable that reaches the height of 100\% on the vertical axis.
To divide each column into the percentages as shown in the frequency table start from the bottom of the column, count to the first percentage and draw a horizontal line to mark it off, then count up to the second percentage from the horizontal line and then mark off again, until all sections are complete
A group of year 12 students surveyed their class and recorded the hair colour and eye colour for each student. The results are displayed in the 100\% stacked column chart shown below.
What is the explanatory variable for this chart?
Does the chart suggest an association between eye colour and hair colour?
Can we say that having blue eyes causes a high chance of having blonde hair?
How to draw a stacked column graph?
Label the horizontal axis with the explanatory variables.
Label the vertical axis with percentages from 0\% to 100\%.
Draw a column for each explanatory variable that reaches the height of 100\% on the vertical axis.
To divide each column into the percentages, start from the bottom of the column, count to the first percentage and draw a horizontal line to mark it off, then count up to the second percentage from the horizontal line and then mark off again, until all sections are complete