If you've ever seen a poll or a popularity survey, you might be familiar with graphs that look something like these:
In terms of representing data in a visually appealing and digestible manner, three of the most common tools are column graphs, histograms and pie charts.
Unlike the dot plot and stem-and-leaf plot, these graphs focus more on representing the relation between different results visually while worrying less about displaying the exact values of the survey. It is for this reason that these charts are often used to represent large data sets.
A column graph always presents a categorical data set. There is one column per category, and the height of each column is the size of that category. There is usually a small gap between each column.
A histogram always presents a numerical data set. There is one column per number (or class of numbers), and the height of each column is the number of times that number appears in the set (or in the class). There is a gap equal to half a column's width between the vertical axis and the first column, and there are no gaps between columns.
Both chart types always label both axes to provide additional information about the data and tell us what type of values we have.
Consider the histogram below:
We can quickly see that, since the column labelled 1 is the tallest, the mode of the data is 1. We can also see that the column labelled 0 has a value of three, and since column 4 is at the same height, both 0 and 4 have a value of three.
The vertical axis label tells us that the values represent the "number of families" while the horizontal axis label tells us that each column represents a specific "number of children in the family".
Putting this information together, we can see that in the survey there were an equal number of families that had 0 and 4 children; three families in each case.
If this histogram feels strangely familiar it is probably because we have already seen this graph in the previous lesson, except in that lesson it was represented as a dot plot.
The reason why these two graphs look so similar, aside from them representing the same data, is because the histogram is essentially a more complex version of the dot plot. Rather than counting dots, the histogram uses a scale to indicate the height of the columns, allowing it to represent larger data sets.
Below is a column graph showing the type of fruit each student in a class brought in for lunch yesterday.
What was the most common fruit?
How many more mandarins were brought than pineapples?
Complete the table below using the information from the graph.
Fruit | Number |
---|---|
\text{Banana} | 2 |
\text{Apple} | ⬚ |
\text{Mandarin} | ⬚ |
\text{Pineapple} | ⬚ |
\text{Orange} | ⬚ |
A column graph always presents a categorical data set while, a histogram always presents a numerical data set.
The scale of a column graph or histogram (indicated by the numbers and ticks on the vertical axis of the graph) is a very useful feature for this data presentation, so we should learn how to read it.
The scale can always be read as if it were a number line, with the marked numbers indicating the value at certain heights, and the ticks between them can be used to determine the values at other heights.
As we can see, the scale of a column graph or histogram is important for reading values. However, there are some cases where the scale may be misleading.
So we should always check the scale to find actual values before making any conclusions about the data.
Use the following applet to explore making column graphs. Drag the blue point on each column to adjust its height.
To solve the applet, ensure that data is sorted first so that we can easily count the number of a particular animal and assign its number to the column graph.
The table shows the number of people who visited Disneyland between 2011 and 2015.
Year | Number of people in hundreds of thousands |
---|---|
2011 | 165 |
2012 | 164 |
2013 | 152 |
2014 | 159 |
2015 | 168 |
Represent the data in a column graph.
A marketing executive examines the histogram and says, "We doubled the number of visitors from 2014 to 2015." Are they correct?
We should always check the scale to find actual values before making any conclusions about the data.
Pie charts are, at first glance, completely different from column graphs and histograms. The main similarity is that the mode of a pie chart is clearly visible, just as it is on a histogram.
What makes a pie chart so different is that it represents the data as parts of a whole. In a pie chart, all the data is combined to make a single whole with the different sectors representing different categories. The larger the sector, the larger percentage of the data points that category represents.
Consider the pie chart below:
Fraction of total | Percentage | |
---|---|---|
Orange | \dfrac{1}{8} | 12.5\% |
Red | \dfrac{1}{2} | 50\% |
Blue | \dfrac{1}{4} | 25\% |
Yellow | \dfrac{1}{8} | 12.5\% |
A notable drawback of the pie chart is that it doesn't necessarily tell us how many data points belong to each category. This means that, without any additional information, the pie chart can only show us which categories are more or less popular and roughly by how much.
It is for this reason that we will often add some additional information to our pie charts so that we can show (or at least calculate) the number of data points in each category. There are two main ways to add information to a pie chart:
By revealing the total number of data points, we can use the percentages represented by the sector sizes to calculate how many data points each sector represents.
There is a case where the percentage taken up by each sector is shown on the pie chart.
This will often look something like this:
This is very useful as it does a lot of the calculations for us. However, it is important that we always check that the percentages on the graph add up to 100\% since a pie chart always represents the whole of the data points, no more and no less.
In this particular case, the percentages do in fact add up to 100\% so this pie chart is valid.
Every student in year 8 was surveyed on their favourite subject, and the results are displayed in this pie chart:
Which was the most popular subject?
What percentage of the class selected History, Phys. Ed., or Languages?
You later find out that 32 students selected Science. How many students are there in year 8?
Pie chart represents the data as parts of a whole. All the data is combined to make a single whole with the different sectors representing different categories. The larger the sector, the larger the percentage of data in that category.
There are two main ways to add information to a pie chart: