If you've ever seen a poll or a popularity survey, you might be familiar with graphs that look something like these:
In terms of representing data in a visually appealing and digestible manner, three of the most common tools are column graphs, histograms and pie charts.
Unlike the dot plot and stem-and-leaf plot, these graphs focus more on representing the relation between different results visually while worrying less about displaying the exact values of the survey. It is for this reason that these charts are often used to represent large data sets.
A column graph always presents a categorical data set. There is one column per category, and the height of each column is the size of that category. There is usually a small gap between each column.
A histogram always presents a numerical data set. There is one column per number (or class of numbers), and the height of each column is the number of times that number appears in the set (or in the class). There is a gap equal to half a column's width between the vertical axis and the first column, and there are no gaps between columns.
Both chart types always label both axes to provide additional information about the data and tell us what type of values we have.
Consider the histogram below:
We can quickly see that, since the column labelled $1$1 is the tallest, the mode of the data is $1$1. We can also see that the column labelled $0$0 has a value of three, and since column $4$4 is at the same height, both $0$0 and $4$4 have a value of three.
The vertical axis label tells us that the values represent the "number of families" while the horizontal axis label tells us that each column represents a specific "number of children in the family".
Putting this information together, we can see that in the survey there were an equal number of families that had $0$0 and $4$4 children; three families in each case.
If this histogram feels strangely familiar it is probably because we have already seen this graph in the previous lesson, except in that lesson it was represented as a dot plot.
The reason why these two graphs look so similar, aside from them representing the same data, is because the histogram is essentially a more complex version of the dot plot. Rather than counting dots, the histogram uses a scale to indicate the height of the columns, allowing it to represent larger data sets.
Below is a column graph showing the type of fruit each student in a class brought in for lunch yesterday.
What was the most common fruit?
Apple
Orange
Banana
Mango
Mandarin
How many more mandarins were brought than mangoes?
Complete the table below using the information from the graph.
Fruit |
Number |
---|---|
Banana | $3$3 |
Apple | $\editable{}$ |
Mandarin | $\editable{}$ |
Mango | $\editable{}$ |
Orange | $\editable{}$ |
The scale of a column graph or histogram (indicated by the numbers and ticks on the vertical axis of the graph) is a very useful feature for this data presentation, so we should learn how to read it.
The scale can always be read as if it were a number line, with the marked numbers indicating the value at certain heights, and the ticks between them can be used to determine the values at other heights.
Consider the column graph below:
How many more people attended the theatre on Friday than Tuesday?
Think: To find how many more people attended, we want to subtract the number of people who attended on Tuesday from the number on Friday. We can obtain both values from the column graph.
Do: Looking at the column graph, we can see that the height of the Tuesday column lines up with the number $20$20. Meanwhile, the height of the Friday column lines up with the second tick above $30$30.
Since there are five ticks to get from $20$20 to $30$30, we know that the distance between each tick will be $2$2. So the height of the Friday column will be $34$34. Evaluating the subtraction gives us $34-20=14$34−20=14.
So does this mean that only $14$14 more people attended the theatre on Friday compared to Tuesday? Not quite.
If we check the label for the vertical axis we can see that these values represent "thousands of people". Taking this into account, we find that the difference between the two days was $14000$14000 people.
Therefore, $14000$14000 more people attended the theatre on Friday compared to Tuesday.
Reflect: To read values off a column graph, we can use the marked numbers and the ticks between them to help us find the heights of columns, then check the vertical axis label to determine what the heights represent.
As we can see, the scale of a column graph or histogram is important for reading values. However, there are some cases where the scale may be misleading.
Consider the column graph below:
If we look only at the columns and ignore the scale, it certainly seems like the number of albums sold doubled from 2014 to 2015. However, if we look at the scale, we can see that the number of albums sold in 2014 and 2015 were $25$25 thousand and $30$30 thousand respectively, so it is clearly not the case that the sales doubled.
The reason why the heights of the columns are so misleading is because the scale doesn't start at zero on this graph. Because it starts at $20$20 thousand, the height of each column actually indicates how much greater than $20$20 thousand the value is.
So we should always check the scale to find actual values before making any conclusions about the data.
The table shows the number of people who visited Disneyland between 2011 and 2015.
Year | Number of people (in hundred-thousands) |
---|---|
2011 | $165$165 |
2012 | $164$164 |
2013 | $152$152 |
2014 | $159$159 |
2015 | $168$168 |
Represent the data in this column graph:
A marketing executive examines the column graph and says, "We doubled the number of visitors from 2014 to 2015!"
Are they correct?
Yes
No
Pie charts are, at first glance, completely different from column graphs and histograms. The main similarity is that the mode of a pie chart is clearly visible, just as it is on a histogram.
What makes a pie chart so different is that it represents the data as parts of a whole. In a pie chart, all the data is combined to make a single whole with the different sectors representing different categories. The larger the sector, the larger percentage of the data points that category represents.
Consider the pie chart below:
We can see from the pie chart (using the legend to check our categories) that the red sector takes up half the circle, while the blue sector takes up a quarter and the yellow and orange sectors both take up one eighth.
The fraction of the circle taken up by each sector indicates what fraction of the total fish are that colour. So, in this case, half the fish are red since the red sector takes up half the circle. We can also write this as a percentage: $50%$50% of the fish are red.
If we consider how much of the circle each sector takes up, we can identify what percentage of the total fish are of each colour.
Colour of fish | Fraction of total | Percentage |
---|---|---|
Orange | $\frac{1}{8}$18 | $12.5%$12.5% |
Red | $\frac{1}{2}$12 | $50%$50% |
Blue | $\frac{1}{4}$14 | $25%$25% |
Yellow | $\frac{1}{8}$18 | $12.5%$12.5% |
Notice that the sum of our percentages is $100%$100%. This is consistent with the fact that a pie chart represents $100%$100% of the data, one whole, split up into different category sectors.
A notable drawback of the pie chart is that it doesn't necessarily tell us how many data points belong to each category. This means that, without any additional information, the pie chart can only show us which categories are more or less popular and roughly by how much.
It is for this reason that we will often add some additional information to our pie charts so that we can show (or at least calculate) the number of data points in each category. There are two main ways to add information to a pie chart:
By revealing the total number of data points, we can use the percentages represented by the sector sizes to calculate how many data points each sector represents.
Consider the pie chart below:
If there are $48$48 fish in total, how many of them are either blue or yellow?
Think: We found in the exploration above that $25%$25% of the fish are blue and $12.5%$12.5% are yellow. Together this represents $37.5%$37.5% of the $48$48 fish.
Do: We can find the number of blue or yellow fish by multiplying the total number of fish by the percentage taken up by these two colours.
Blue or yellow fish | $=$= | $48\times37.5%$48×37.5% |
$=$= | $48\times\frac{3}{8}$48×38 | |
$=$= | $18$18 |
As shown, $18$18 fish are either blue or yellow.
Reflect: By relating the sizes of sectors to fractions or percentages, we can calculate the number of data points belonging to a category by multiplying that fraction (or percentage) by the total number of data points.
Revealing the total number of data points is useful for calculating the value represented by each sector, but this is only if we can interpret the exact size of each sector from the pie chart.
In the case where it is not so obvious what percentage of the pie chart each sector represents, we can instead add information by explicitly stating how many data points each sector represents. This can be written either on the sectors or the legend, as shown below.
Consider the pie chart below:
Show that the sector representing basketball takes up $43%$43% of the pie chart.
Think: To show that the basketball sector takes up $43%$43% of the pie chart, we need to show that the number of basketball data points is equal to $43%$43% of the total data points.
Do: We can see from the pie chart that the basketball sector represents $86$86 data points. By adding up the data points from all the different sectors, we find that the total number of data points is:
Total number of data points | $=$= | $86+27+53+30+4$86+27+53+30+4 |
$=$= | $200$200 |
So the percentage of the total number of data points represented by basketball is:
Percentage | $=$= | $\frac{86}{200}\times100%$86200×100% |
$=$= | $43%$43% |
Since basketball represents $43%$43% of the data points, its sector must take up $43%$43% of the pie chart.
Reflect: We can calculate the exact percentage of the pie chart that different sectors take up by finding their number of data points as a percentage of the total.
Aside from these two ways to add extra information to a pie chart, there is also the case where the percentage taken up by each sector is shown on the pie chart.
This will often look something like this:
This is very useful as it does a lot of the calculations for us. However, it is important that we always check that the percentages on the graph add up to $100%$100% since a pie chart always represents the whole of the data points, no more and no less.
In this particular case, the percentages do in fact add up to $100%$100% so this pie chart is valid.
Every student in year $8$8 was surveyed on their favourite subject, and the results are displayed in this pie chart:
Which was the most popular subject?
Phys. Ed.
Maths
History
Languages
Science
English
What percentage of the class selected History, Phys. Ed., or Languages?
$50%$50%
$30%$30%
$3%$3%
$25%$25%
You later find out that $32$32 students selected Science. How many students are there in year $8$8?