We have seen that the range and interquartile range can be used to measure the spread of data and both can be seen on a boxplot. We can also see the median, upper and lower quartiles, and sometimes extreme values.
Parallel boxplots are used to compare two sets of data visually. The data sets must use the same numerical variable, but for two different groups or categories.
It is important to clearly label each boxplot. Here are two parallel box plots comparing the time it took two different groups of people to complete an online task.
The boxplots must be drawn on the same scale to properly compare them.
These two boxplots show the data collected by the manufacturers on the lifespan of light bulbs, measured in thousands of hours.
Complete this table using the two boxplots. Write each answer in terms of hours, remembering to multiply the values on the data display by 1000.
Manufacturer A | Manufacturer B | |
---|---|---|
Median | ||
Lower quartile | ||
Upper quartile | ||
Range | ||
Interquartile range |
Which manufacturer produces light bulbs with the best lifespan?
Sophie and Holly have been playing soccer for 20 years. These boxplots represent the total number of goals Sophie and Holly scored in each of their 20 seasons.
Who had the highest scoring season?
How many more goals did Holly score in her best season compared to Sophie in her best season?
What is the difference between the median number of goals scored in a season by each player?
Which player was more consistent?
The advertised fuel efficiency for 12 cars and 12 trucks was recorded in this table..
Cars | 15 | 17 | 18 | 22 | 22 | 22 | 23 | 25 | 26 | 31 | 35 | 50 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Trucks | 12 | 13 | 13 | 14 | 15 | 15 | 15 | 16 | 16 | 17 | 19 | 27 |
The car data was represented with a boxplot.
The truck data was represented with a dot plot.
Convert the dot plot to a boxplot and draw a parallel boxplot comparing cars to trucks.
Select a measure of center from both data sets to compare the fuel efficiency of cars versus trucks.
Compare the spread of the data sets.
Different displays help us to identify different key features like the center and spread of a data set.
Let's start with the displays we have seen for categorical data.
For most numerical data displays, we can identify or estimate frequencies, shape, measures of center, measures of spread, and if there are outliers
Measures of center summarize the data set with a single value. We often use this to generalize which group performed better.
Measures of spread summarize the spread of a data set. We often use this to generalize which group was more consistent. The lower the spread the more consistent the data.
For numerical data, the shape shows us how the data is spread out and where there are clusters, peaks, or gaps.
From boxplots, we can only identify:
We should expect then that the shape of data would be the same whether it is represented in a boxplot or histogram.
Boxplots divide data into four equal quartiles using the lower extreme (minimum), lower quartile, median, upper quartile, and upper extreme (maximum).
A high school has scolarship programs for gymnastics and basketball. The histogram, dot plot, and boxplot summarize the heights of 29 students in a class, in inches:
What do you notice about the different displays for the same data?
What can you see from the histogram and dot plot that you can't see from the boxplot?
What can you see from the boxplot and dot plot that you can't see from the histogram?
Histograms, boxplots, and dot plots may display the same data, but the different displays have their own strengths and weaknesses.
Neither histograms nor boxplots show every individual data value, but histograms will show intervals where there may be gaps or a lower frequency of data.
Boxplots provide a quick, efficient overall view of the shape, center and spread of the data if we're not interested in where there may be gaps in the data.
Some people choose data displays that can be misleading. It's important to choose a data display that shows a true picture of the data.
Determine the best type of data display(s) for each statistical question:
How much variation is there in the number of zucchinis produced by a single plant?
What types of vegetables are most popular to grow among urban gardeners?
Shown are the quiz score percentages from Mr. Sanchez's first period math class: \left\{20,\,25,\,26,\,30,\,30,\,40,\,43,\,63,\,65,\,67,\,70,\,70,\,75,\,90,\,93 \right\}
Construct a boxplot of the quiz scores.
What are the advantages and disadvantages of a boxplot?
Explain whether a dot plot or a histogram could be a better display for the data.
Match the boxplot shown to the correct histogram.
We should expect that the shape of data would be the same whether it is represented in a boxplot or histogram.
The best display for a data set is one that reveals the information we want to share. Some displays hide key information like the individual data points, the total number of data points, or features like the shape, clusters, gaps, and spread.
As a starting place, consider:
For numerical data, if there is a small quantity and range of data, try a dot plot.
If the data has a large range or quantity of data, try a histogram or boxplot.
Choose a boxplot if you only need to see an overview of center, spread and shape.
Choose a histogram if, in addition to center, spread, and shape, you want to know the size of the data set and view any gaps or clusters among various intervals.