Graphs are a visual way of presenting information. They can be very useful as they help us sort and order the information we collect and present it in a clear, concise way. Selecting a good type of graph to display your data is important, and the best type of graph to choose will change depending on the type of information you need to display. Let's go through a few different types of graphs now.
Bar graph is a generic name for any graph that displays information using rectangular or cylindrical bars.
The sales (in thousands) of different products are shown in the following horizontal bar graph.
Which of the following is the best-selling product?
How many units of all products were sold in total?
If product B was sold at \$50 each, find the revenue generated by product B alone.
Bar graph is a generic name for any graph that displays information using rectangular or cylindrical bars.
A column graph is the name for a specific type of bar graph that uses vertical bars, so that they appear like columns.
A survey of the preferred sport was done for a group of boys and the results are shown in the bar graph below:
How many boys prefer football to other sports?
Which type of sport is the most popular?
How many boys took part in the survey?
A column graph is the name for a specific type of bar graph that uses vertical bars, so that they appear like columns.
Bar graphs and column graphs are used to display categorical data Categorical data .
There is one bar for each category and the height or length of the bar represents the frequency.
Bars are drawn with gaps to show that each value is a separate category.
Let's say we decided to conduct an experiment experiment about what is the most common coloured car in the neighbourhood, and we are going to record the colours of the next 50 cars that drive past.
How would we keep track of what we saw? We could write a list, but that might look a bit messy and be a bit hard to understand, like the list below:
Green, white, yellow, white, black, green, black, blue, blue, gold, silver, white, black, gold, green, blue, purple, blue, white, black, gold, silver, silver, red, red, red, black, gold, red, blue, white, black, silver, silver, purple, pink, white, blue, red, black, yellow, blue, white, white, red, green, pink, black, white, red.
A nicer way to keep track of data like this is to create a frequency table and keep a tally of the results.
Frequency refers to how often an event occurs. We make use of frequency tables as an easy way to display data because we can have one column showing a list of the possible outcomes that may occur, a second column with tally marks of the frequency of each event (although this column isn't always included), and a third with the total frequency as a number. Frequency tables are useful for surveys, as you can keep a running total easily each time someone responds.
In a survey some people were asked approximately how many minutes they take to decide between brands of a particular product.
Complete the frequency table.
How many people took part in the survey?
What proportion of people surveyed took 2 minute to make a decision?
Frequency refers to how often an event occurs.
A nicer way to keep track of data is to create a frequency table and keep a tally of the results.
A divided bar graph is a graph in that the bar represents the whole data set and the bar is divided into several segments to represent the proportional size of each category.
A bar graph can be any length, but it can be helpful to think about what length could make the data easier to divide up - multiples of 5 or 10 are often good. Remember that you don't want it to be too long or too short.
To work out how much of the bar graph each colour represents, we want to write each colour as a fraction of the whole, then evaluate evaluate this fraction of the line. For example, \dfrac{30}{100} or \dfrac{3}{10} of the jellybeans are greenand \dfrac{3}{10} \times 10 = 3. This means that 3 cm of the 10 cm bar graph should be given to the green jellybeans. Similarly \dfrac{28}{100}\times 10=2.8, so 2.8 cm should be given to both pink and orange and 1.4 cm should be given to white.
We can check we've calculated everything correctly by adding up the length values:3 + 2.8 + 2.8 + 1.4 = 10 so we know we've got everything correct.
The divided bar graph shows the percentage of total subscriptions that each newspaper has. The Age has 54\,000 subscriptions:
What is 1\% of total subscriptions?
Find the total number of subscriptions.
A divided bar graph is a graph in that the bar represents the whole data set and the bar is divided into several segments to represent the proportional size of each category.
Histograms are similar to bar or column graphs. There are two main differences:
Bar or column graphs are usually used to display categorical data, while histograms are used to display numerical data.
Bar and column graphs are drawn with spaces between the columns, while histograms do not have spaces between the columns.
This is because histograms are used to display discrete or continuous numerical data. In other words, there are no distinct categories between the groups. Instead, histograms display ranges of data that are determined by the person creating the graph. The width of the columns in a histogram are used to show the interval that they represent.
Each student in a class was surveyed and asked about the colour of their eyes. The data is categorical and the results are displayed in a column graph below:
Each student in a class was surveyed and asked the size of their families. The data is numerical and the results are displayed in a histogram below:
The data that was collected in this survey is called discrete data because it can take particular values (in this case whole numbers). In histograms that display discrete data the mark is located in the centre of the columns across the horizontal access. The height of each column represents the frequency of each data item.
Each student in a class was surveyed and asked their heights. The data is numerical and the results are displayed in a histogram below:
The data that was collected in this survey is called continuous data because it can take any value within a range . In histograms that display continuous data, the column width represents the range of each interval or bin. The height of each column represents the frequency of each data item within each interval.
Continuous data is represented in a histogram as shown:
Complete the following frequency table:
Score | Frequency |
---|---|
21 | |
23 | |
25 | |
27 | |
29 | |
31 |
Histograms are similar to bar or column graphs. There are two main differences:
Bar or column graphs are usually used to display categorical data, while histograms are used to display numerical data.
Bar and column graphs are drawn with spaces between the columns, while histograms do not have spaces between the columns.
When we describe the shape of data sets, we want to focus on how the scores are distributed. Some questions that we might be interested include:
Is the distribution symmetrical or not?
Are there any clusters or gaps in the data?
Are there any outliers?
Where is the centre of the data located approximately? (Recall our three measures of centre: mean, median, and mode)
Is the data widely spread or very compact? (Recall our three measures of spread: range, interquartile range and standard deviation standard deviation )
Data may be described as symmetrical or asymmetrical.
There are many cases where the data tends to be around a central value with no bias bias left or right. In such a case, roughly 50\% of scores will be above the mean and 50\% of scores will be below the mean. In other words, the mean and median roughly coincide.
The normal distribution is a common example of a symmetrical distribution of data.
A uniform distribution is a symmetrical distribution where each outcome is equally likely, so the frequency should be the same for each outcome. For example, when rolling dice the outcomes are equally likely, while we might get an irregular column graph if only a small number of rolls were performed if we continued to roll the dice the distribution would approach a uniform distribution like that shown below.
If a data set is asymmetrical instead (i.e. it isn't symmetrical), it may be described as skewed.
A data set that has positive skew (sometimes called a 'right skew') has a longer tail of values to the right of the data set. The mass of the distribution is concentrated on the left of the figure.
A data set that has negative skew (sometimes called a 'left skew') has a longer tail of values to the left of the data set. The mass of the distribution is concentrated on the right of the figure.
In a set of data, a cluster occurs when a large number of the scores are grouped together within a small range. Clustering may occur at a single location or several locations. For example, annual wages for a factory may cluster around \$ 40\,000 for unskilled factory workers, \$ 55\,000 for tradespersons and \$ 70\,000 for management. The data may also have clear gaps where values are either very uncommon or not possible in the data set.
As we have seen previously, an outlier is a data point that varies significantly from the body of the data. An outlier will be a value that is either significantly larger or smaller than other observations. Outliers are important to identify as they point to unusual bits of data that may require further investigation and impact some calculations such as mean, range, and standard deviation.
A distribution is said to be symmetric if its left and right sides are mirror images of one another.
A uniform distribution is a symmetrical distribution where each outcome is equally likely, so the frequency should be the same for each outcome.
A data set that has positive skew (sometimes called a 'right skew') has a longer tail of values to the right of the data set. The mass of the distribution is concentrated on the left of the figure.
A data set that has negative skew (sometimes called a 'left skew') has a longer tail of values to the left of the data set. The mass of the distribution is concentrated on the right of the figure.
To determine the modality of a data distribution:
If there is a single class the data is uni-modal.
If there are two classes the data is bi-modal.
If there are more than two the data is multi-modal.
An outlier is a value that is either noticeably greater or smaller than other observations.