topic badge

3.025 Displaying grouped data sets

Lesson

Displays for grouped data

Continuous numerical data, such as times, heights, weights or temperatures, are based on measurements, so any data value is possible within a large range of values. Because the range of values can be quite large it can be more practical and efficient to organise the raw data into groups or class intervals of equal range in the frequency distribution table. 

The class centre is the average of the endpoints of each interval.

For example, if the class interval is $45-50$4550, the class centre is calculated as follows:

Class interval $=$= $\frac{45+50}{2}$45+502
Class interval $=$= $47.5$47.5
 

Because the class centre is an average of the endpoints, it is often used as a single value to represent the class interval.

As an example, the following frequency distribution table and histogram represent the times taken for $72$72 runners to complete a ten kilometre race.

Class interval Class Centre Frequency
$45-50$4550 $47.5$47.5 $9$9
$50-55$5055 $52.5$52.5 $7$7
$55-60$5560 $57.5$57.5 $20$20
$60-65$6065 $62.5$62.5 $30$30
$65-70$6570 $67.5$67.5 $6$6

 

Remember!
  • Every data value must go into exactly one and only one class interval
  • Each class interval must be the same size, e.g. $1-5$15, $5-10$510, $10-15$1015..., $10-20$1020, $20-30$2030, $30-40$3040,...
  • The class centre is the average of the end points of the class interval

Practice question

Question 1

Lachlan wanted to see whether his car’s fuel consumption is the same as the fuel consumption stated by the car company. He measured his fuel consumption (in Litres) over several journeys of equal distance.

$9.1,9.1,9.1,9.1,9.2,9.2,9.3,9.4,9.5,9.5,9.6,9.7,9.7,9.8,9.8,9.8,9.8,9.9,10.1,10.1,10.1,10.1,10.2,10.2,10.2,10.2,10.2,10.2,10.3,10.4,10.4,10.5,10.6,10.6,10.7,10.8,10.9,10.9,10.9,10.9$9.1,9.1,9.1,9.1,9.2,9.2,9.3,9.4,9.5,9.5,9.6,9.7,9.7,9.8,9.8,9.8,9.8,9.9,10.1,10.1,10.1,10.1,10.2,10.2,10.2,10.2,10.2,10.2,10.3,10.4,10.4,10.5,10.6,10.6,10.7,10.8,10.9,10.9,10.9,10.9

  1. Organise these results into a frequency table.

    Score (in Litres) Class Centre Frequency
    $9$9 - $9.4$9.4 $\editable{}$ $\editable{}$
    $9.5$9.5 - $9.9$9.9 $\editable{}$ $\editable{}$
    $10$10 - $10.4$10.4 $\editable{}$ $\editable{}$
    $10.5$10.5 - $10.9$10.9 $\editable{}$ $\editable{}$
  2. How many times did Lachlan measure the fuel consumption?

  3. What is the modal class?

    $\editable{}$ - $\editable{}$ Litres

  4. Construct a frequency histogram for the data.

    ScoreFrequency510159 - 9.49.5 - 9.910 - 10.410.5 - 10.9

  5. Using the class centres, estimate the average fuel usage per trip to two decimal places.

Frequency polygon

Using the example of running times, to draw a frequency polygon we use the class centre from the frequency table on the horizontal axis.

Class interval Class centre Frequency
$45$45-$50$50 $47.5$47.5 $9$9
$50$50-$55$55 $52.5$52.5 $7$7
$55$55-$60$60 $57.5$57.5 $20$20
$60$60-$65$65 $62.5$62.5 $30$30
$65$65-$70$70 $67.5$67.5 $6$6

Notice that the class centres have been used as the scale on the horizontal axis. Each point on the frequency polygon is a coordinate pair made up of the class centre and the frequency: (class centre, frequency). 

Misuse of statistical graphs

Just as statistical measures can be used inappropriately in the media to influence people, graphs can also be used in various ways to convey a certain message. 

Some common features of graphs that may lead to incorrect interpretations are:

  1. Omitting the baseline
  2. Showing an inappropriate or irregular scale
  3. Scale or labels not clearly given
  4. Leaving data out
  5. Using pictures or three-dimensional graphics that distort differences
  6. Using the wrong graph for a given data type

Let's look at an example of each characteristic above.

1. Omitting the baseline

Not starting the vertical axis at zero can give the impression that there is a significant difference between values when in fact there may be very little change. This is referred to as a truncated-graph.

Misleading: graph has a truncated axis exaggerating the difference in the parties. It appears that twice as many Liberal party members are in favour of the bill than Labor. Accurate: graph has a vertical axis starting at the zero baseline and shows there is only a small difference in the proportion in favour of the bill.

This is not to say the vertical axis always has to start at zero, as we can see in our next example. However, caution should be taken to not exaggerate differences. When not starting the vertical axis at zero a clear indication of a broken scale should be given and it is not recommended for graphs such as column graphs where the viewer takes in the comparative areas of the different categories.

2. Showing an inappropriate or irregular scale

The scale given for the graph should be even and in proportion to the data. The scale should not be compressed or expanded to exaggerate or diminish change. 

Misleading: the range of values shown on the vertical axis is disproportionate to the data. This makes it appears as though the temperature is constant - no variation or overall increasing trend. Accurate: the range of values shown on the vertical axis is in proportion to the data. We can now see the data has high variability and an overall increasing trend.
The above three graphs show the same data but by compressing or expanding the horizontal axis it can appear as if the trend is gradual or quite abrupt.

3. Scale or labels not clearly given

If a scale is omitted or missing units then the reader can not interpret if the trend seen is significant or not. 

Without a scale and units on the vertical axis we cannot tell if the sales are in single units sold, $100$100 units sold, dollars, or so forth, this means we have no means to tell if the difference and upward trend seen is significant. There may also be a truncated axis.

4. Leaving data out

A common way to mislead the audience is to cherry-pick the data shown, so it only includes statistics that support a particular conclusion. Whenever a broad range of information exists, appearances can be manipulated by highlighting some facts and ignoring others. 

This graph only shows a few months of data and displays an overall negative trend. Expanding the data to three years, we see a bit more variability and an overall upward trend. 
Selecting temperature data from the years 1997 - 2012 we see a relatively stable temperature with no obvious upward trend. Compare this to the graph given earlier.

5. Using pictures or three-dimensional graphics that distort differences

Using graphics in perspective can make it appear as though the section of the graph at the front is larger in comparison to sections further back even when they are the same size. Improperly scaling graphics can also distort the difference in categories significantly. 

In this column graph, the front bar looks far larger than the bar at the back due to perspective. However, both columns represent the same value. Three dimensional bars can also lead to confusion when reading the scale.

 

Here again, we see a three-dimensional graphic in perspective with one piece brought forward from the chart. The perspective makes the front piece of 7% look far larger than the equal valued slice at the rear. Here is the same data without the perspective and we can now clearly compare the size of the slices. The two equal valued slices now appear the same size.
6. Using the wrong graph for a given data type


The type of graph used to visualise the data depends on the type of data you have and the characteristic of the data you wish to highlight. Choosing an inappropriate graph type may lead to the reader misinterpreting the data. 

Pie charts are used to show the composition of a whole and not comparison across groups. Here we can see the percentages add to more than $100%$100%, it is likely that respondents to the survey could select more than one option. Here the data is represented as a bar chart and we can make a clear comparison across all the categories.                                                                                                                                                        

Practice question

Question 2

Refer to the graph to answer the following questions.

Source: Brookings report on American education

  1. What is a fault with this graph?

    By cropping the bottom section of the graph the author has made the decrease in math scores appear larger than it really is

    A

    By cropping the bottom section of the graph the author has made the increase in math scores appear larger than it really is

    B

    By cropping the bottom section of the graph the author has made the decrease in math scores appear smaller than it really is

    C

    By cropping the bottom section of the graph the author has made the increase in math scores appear smaller than it really is

    D
  2. What is another fault with this graph?

    The labels on the vertical axis are not evenly spaced

    A

    The labels on the horizontal axis are not evenly spaced

    B

    The graph does not have a scale break

    C
  3. Why is this a problem?

    It has made the increases in the 4-year intervals 1992-1996 and 1996-2000 appear faster than they really are (relative to the rate in the 2-year interval 1990-1992)

    A

    It has made the increases in the 4-year intervals 1992-1996 and 1996-2000 appear slower than they really are (relative to the rate in the 2-year interval 1990-1992)

    B

    It has made line segment in 1990-1992 interval appears more steep than it should be

    C

    It is not a problem

    D

Outcomes

MA12-8

solves problems using appropriate statistical processes

What is Mathspace

About Mathspace