topic badge
AustraliaVIC
VCE 11 General 2023

1.02 Categorical data

Lesson

In the world of data, often there is an interest in the number of times, or frequency, that something occurs. It could be the number of road accidents caused by drink driving, the number of hot days in a year, or the number of visits to a website in a month.

In situations like these, where the same data value can occur multiple times, the data can be organised into a frequency table.

 

Categorical data and frequency tables

As an example, say that the colour of every car that passed though a given intersection was recorded over a ten minute period:

green, white, yellow, white, black, green, black, blue, blue, silver, white, black, green, blue, blue, white, black, silver, silver, red, red, red, black, white, blue, white, black, silver, silver, white, blue, white, black, yellow, blue, white, white, red, green, silver, black, white, black, white.

The same colours are occurring multiple times, so it makes sense to organise the data using a frequency table.

Vehicle colour Tally Frequency
Black |||| |||| $9$9
White |||| |||| || $12$12
Blue |||| || $7$7
Green |||| $4$4
Silver |||| | $6$6
Yellow || $2$2
Red |||| $4$4

Notice that the frequency table has three columns:

  • The first column shows the subgroups within the data
  • The tally column (optional) uses tally marks to record the frequency of each subgroup
  • The final column sums the tally marks and records the frequency as a number

The sum of the frequencies is equal to the total number of data values. In this case, the colours of $44$44 vehicles were recorded.

The graph on the right is a column graph, which will be discussed further below. Column graphs are well suited for displaying frequencies for categorical data. The table and graph allow us to analyse the frequency distribution–how the frequency of outcomes is spread across the different categories.

Practice questions

Question 1

Mr. Rodriguez recorded the number of pets owned by each of the students in his class. He found that $15$15 people had no pets, $19$19 people had one pet, $3$3 people had two pets and $8$8 people had three pets.

Write Mr. Rodriguez's results in the frequency table below.

  1. Number of Pets Frequency
    $0$0 $\editable{}$
    $1$1 $\editable{}$
    $2$2 $\editable{}$
    $3$3 $\editable{}$

 

Column graphs

A column graph (or bar graph) is a graph used to display relative quantities of data in different categories. There are a wide range of topics where this type of graph can be useful. For example, the number of tourists visiting different destinations in a month, the number of people with certain eye colours in a class or the popularity of the top ten YouTube channels.

Eye colour in a class

 

Column graph features to remember
  • Title the graph clearly
  • Label each axis
  • Label each category and give the vertical axis a clear scale
  • Make sure the columns are equal widths and equally spaced
  • Make sure that the height of the column matches the value it represents which is often the frequency of the category (how often that outcome occurred)

Worked example

Example 1

Consider the column graph below:

How many more people attended the theatre on Friday than Tuesday?

Think: To find how many more people attended, we want to subtract the number of people who attended on Tuesday from the number on Friday. We can obtain both values from the column graph.

Do: Looking at the column graph, we can see that the height of the Tuesday column lines up with the number $20$20. Meanwhile, the height of the Friday column lines up with the second tick above $30$30.

Since there are five ticks to get from $20$20 to $30$30, we know that the distance between each tick will be $2$2. So the height of the Friday column will be $34$34. Evaluating the subtraction gives us $34-20=14$3420=14.

So does this mean that only $14$14 more people attended the theatre on Friday compared to Tuesday? Not quite.

If we check the label for the vertical axis we can see that these values represent "thousands of people". Taking this into account, we find that the difference between the two days was $14000$14000 people.

Therefore, $14000$14000 more people attended the theatre on Friday compared to Tuesday.

Reflect: To read values off a column graph, we can use the marked numbers and the ticks between them to help us find the heights of columns, then check the vertical axis label to determine what the heights represent.

As we can see, the scale of a column graph is important for reading values. However, there are some cases where the scale may be misleading.

Consider the column graph below:

If we look only at the columns and ignore the scale, it certainly seems like the number of albums sold doubled from 2014 to 2015. However, if we look at the scale, we can see that the number of albums sold in 2014 and 2015 were $25$25 thousand and $30$30 thousand respectively, so it is clearly not the case that the sales doubled.

The reason why the heights of the columns are so misleading is because the scale doesn't start at zero on this graph. Because it starts at $20$20 thousand, the height of each column actually indicates how much greater than $20$20 thousand the value is.

So we should always check the scale to find actual values before making any conclusions about the data.

Practice questions

Question 2

Miss Yen recorded her students' favourite pets in the column graph below.

Student's favorite petsPetsNumber of students51015HamsterDogCatFishBird

A bar graph titled "Student's favorite pets" displays the preferences of an unspecified number of students for five types of pets: dogs, hamsters, birds, cats, and fish. The vertical axis is labeled "Number of students" and has a scale that goes from 0 to 15 labeled in increments of 5 and minor ticks in increment of 1. Each category of pet has a corresponding bar that indicates the number of students who favor that pet. The bar for Hamster shows a count of 6. The bar for Dog shows a count of 7. The bar for Cat shows a count of 4. The bar for Fish shows a count of 8. The bar for Bird shows a count of 11. The count for each category is not explicitly labeled.
  1. Which type of pet is most popular?

    Fish

    A

    Bird

    B

    Hamster

    C

    Cat

    D

    Dog

    E
  2. Which pet is least popular?

    Bird

    A

    Cat

    B

    Fish

    C

    Dog

    D

    Hamster

    E
  3. How many students picked a Fish as their favourite pet?

  4. How many students were surveyed in total?

Question 3

John had $7$7 blue marbles, $4$4 black marbles, $8$8 yellow marbles, $13$13 white marbles, and $6$6 red marbles. Use this information to complete the column graph.

  1. MarblesColorNumber of Marbles51015blueblackyellowwhitered

Outcomes

U1.AoS1.2

the concept of a data distribution and its display using a statistical plot

What is Mathspace

About Mathspace