topic badge

9.03 Displaying and interpreting univariate data

Lesson

Concept summary

Data displays are useful to aid with summarizing, analyzing, and interpreting a data distribution. The best data displays make useful information easy to read for the intended audience.

Frequency tables can be used to organize both categorical and numerical data.

Favorite colorFrequency
\text{red}5
\text{blue}3
\text{green}10
\text{yellow}20

Categorical data can be displayed using circle graphs or bar graphs.

A circle graph is divided into 5 sectors of different colors and sizes. Each sector contains a percentage corresponding to the proportion of category and a legend shows the category and corresponding color. The data is as follows: Car, 70%; Train, 7%; Bicycle, 11%; Bus, 9%; Walk, 3%

In a circle graph, each piece represents the percentage or proportion of a category and all pieces add to 100\%.

A bar graph is titled What type of pet do you have? There is no title on the vertical axis but divided into scale from 0 to 8 in an interval of 2. The horizontal axis has no label but contains the categories with boxes of different heights to represent the frequency. A vertical bar is shown for each category and height: Fish, 2; Dog, 6; Cat, 4; Lizard, midway between 0 and 2.

In a bar graph, the height of each bar represents the frequency of that category.

Numerical data can be displayed using histograms, box plots, line plots, and stem-and-leaf plots.

Histograms and box plots are good for large quantities of data. The shape, center, and spread of the data is easy to read, but the individual data values are lost in these types of displays.

Histograms display the frequency of data as either a count or relative proportion along the y-axis and divide the numerical data into bins of equal width along the x-axis.

When constructing a histogram:

  • There should be no gaps between the bars

  • The bin range includes the value on the left side and excludes the value on the right side

  • All bins should be the same width visually and numerically

A histogram entitled Annual Rainfall in inches. The y-axis is titled Number of Cities. The x-axis is titles Rainfall in inches and shows the boxes of different heights and bin ranges of the same width visually and numerically. The bin ranges of each of the bars of different heights are as follows: 1 to 5 has height 10; 5 to 9 has height 22; 9 to 13 has height 46;13 to 17 has height 48; 17 to 21 has height 18; 21 to 25 has height 9.

Generally we exclude the lower bound and include the upper bound, so the equivalent labels for the x-axis would be:

A histogram entitled Annual Rainfall in inches. The y-axis is titled Number of Cities. The x-axis is titles Rainfall in inches and shows the boxes of different heights and bin ranges of the same width visually and numerically. The bin ranges of each of the bars of different heights are as follows: open bracket 1, 5 close bracket, height 10; open parenthesis 5, 9 close bracket, height 22; open parenthesis 9, 13 close bracket, height 46; open parenthesis 13,  17 close bracket, height 48; open parenthesis 17 , 21 close bracket, height 18; open parenthesis 21, 25 close bracket, height 9.

Box plots divide data into four equal quartiles using the five-number summary: minimum, lower quartile, median, upper quartile, and maximum.

When constructing a boxplot:

  • The box in the center represents the middle half and spread of the data

  • The line in the box represents the median

  • The two endpoints represent the minimum and maximum value

  • The total count of data is not included in the box plot
A box plot on a number line ranging from 0 to 40 with an interval of 2. A line extends from 4 to 7, a box extends from 7 to 28 with a median plotted at 10 represented by a vertical segment in the box. A line extends from 28 to 34.

Since every individual data point must be included in a line plot and stem-and-leaf plot these data displays are best for smaller data sets.

Line plots display the frequency of data by the number of dots at each value. This display is best used for discrete values with a small range.

When constructing a line plot:

  • The number line must count by equal amounts with no gaps or "breaks"
  • If a value on the number line is empty it means no data exist for that value
  • Dots or X's represent each individual data point and should be stacked with equal size.
A line plot with no title is shown. X's representing individual data point are stacked with equal size. The number line ranges from 1 to 3 with an interval of one fourth. The number of X's in each interval is as follows: at 1, 2; at 1 and 1 fourth, 3; at 1 and 1 half, 2; at 1 and 3 fourth, 1; at 2, 3, at 2 and 1 fourth, 3; at 2 and 1 half, 2; at 2 and 3 fourth, 1; and at 3, 1.
A stem-and-leaf plot. The left column is titled Stem, with numbers 0 through 6, and right column titled Leaf. The data is as follows: at stem 0, the leaves are 1, 4, 5, and 8; at stem 1, the leaves are 0, 1 and 3; at stem 2, the leaves are 7, 7, 8; at stem 3, the leaves are 0, 1 and 1; at stem 4, the leaves are 0 and 4; at stem 5, the leaf is blank; at stem 6, the leaves re 3 and 6.

Stem-and-leaf plots organize data by place value to compare frequencies and are best for data values with a larger range.

When constructing a stem-and-leaf plot:

  • In this case the "stem" represents the hundreds and tens digits together and the "leaf" represents the ones digit
  • Both the stem column and each leaf row should be in increasing order
  • No stem values can be skipped. If there is no data for that particular stem, the leaf column will have an empty row.

Worked examples

Example 1

Determine the best type of data display(s) for each set of data.

a

In order to determine the popularity of each meal, the number of classmates who like tacos, pizza, or cheeseburgers the most was collected.

Approach

First consider if the data is categorical or numerical, then consider the information being displayed.

Solution

Circle graph or bar chart would be the best display.

Reflection

Categorical data can be displayed with frequency tables, circle graphs, and bar graphs. Since we want to see the relative popularity of each meal we can rule out a frequency table as the best display. Note that a circle graph may not indicate to a viewer the total number of responses or even the count of each response depending on how its labeled.

b

The height all 196 Olympic gymnasts in the most recent summer games.

Approach

First consider if the data is categorical or numerical. Since height is numerical we want to decide which graphs are easiest to construct based on the size of the data set, and also the range of its values.

Solution

Histogram or box plot

Reflection

This is a very large data set. It would be inefficient to try and construct a line plot with a dot representing every individual athlete or even a stem-and-leaf plot with each leaf representing every individual athlete.

Example 2

Use the data display to answer each question.

Entitled Mrs. SAnchez's Period 1 Math Quiz Results, A box plot on a number line ranging from 0 to 100 with interval of 10 is shown. A line extends from 20 to 30 and a box extends from 30 to 70 with a median represented by a vertical segment between 60 and 70. A line extends from 70 to a number between 90 and 100.
a

Estimate the median score from the math quiz.

Approach

The median is represented by the vertical line in the box of the box plot.

Solution

Around 63\%

Reflection

The median is used to describe the center of the data.

b

Estimate the range and interquartile range of the data.

Approach

The range of the data is the difference between the maximum and minimum values. The interquartile range is the difference between the upper and lower quartiles.

Solution

The range is approximately 93\%-20\%=73\% and the interquartile range is approximately 70\%-30\%=40\%.

Reflection

The range and interquartile range are used to describe the spread of the data.

c

What are the disadvantages of this box plot?

Solution

We don't know how many students are in Mrs. Sanchez's class and we can't see any individual scores from the quiz.

d

What are the advantages of a box plot?

Solution

Box plots visually summarize large sets of data although they can be used for small sets too. In a box plot it is easy to see and estimate the shape, center, and spread of data.

Outcomes

MA.912.DP.1.1

Given a set of data, select an appropriate method to represent the data, depending on whether it is numerical or categorical data and on whether it is univariate or bivariate.

MA.912.DP.1.2

Interpret data distributions represented in various ways. State whether the data is numerical or categorical, whether it is univariate or bivariate and interpret the different components and quantities in the display.

What is Mathspace

About Mathspace