Bivariate Data

Lesson

Parallel box plots are used to compare two sets of data visually. When comparing box plots, the $5$5 key data points are going to be the important parts to compare. This $5$5number summary will give you:

- the lowest data point
- the highest data point
- the upper quartile
- the lower quartile and
- the median

Just like when we look at back-to-back stem and leaf plots, we can compare the spread of data in two box plots. We call these parallel box plots as they are presented parallel to each other along the same number line for comparison. They must therefore be in the same scale, so a visual comparison is fairly straightforward.

It is important to clearly label each box plot. Here I have plotted two sets of data, comparing the time it took two different groups of people to complete an online task.

You can see that overall the under $30$30s were faster at completing the task. Both the under $30$30s box plot and the over $30$30s box plot are slightly negatively skewed. Over $75%$75% of the under $30$30s completed the task in under $22$22 seconds, which is the median time taken by the over $30$30s. $100%$100% of the under $30$30s had finished the task before $75$75% of the over $30$30s had completed it.

Overall the under $30$30s performed better and had a smaller spread of scores. There was a larger variance within the over $30$30 group, with a range of $24$24 seconds compared to $20$20 seconds for the under $30$30s.

When comparing two sets of data we can compare the 5 key points. as shown above. There are key questions you should ask yourself:

How do the spreads of data compare?

How do the skews compare? Is one set of data more symmetrical?

Is there a big difference in the medians?

The box plots show the distances, in centimetres, jumped by two high jumpers.

a) Who has a higher median jump?

Think: The median is shown by the line in the middle of the box. Whose median line has a higher value?

Do: John

b) Who made the highest jump?

Think: The highest jump is the end of the whisker for each jumper. Bill doesn't have an upper whisker as his highest jump was the same as the upper quartile height. Whose jump was the highest?

Do: John

c) Who made the lowest jump?

Think: The lowest jump is shown on each box plot by the lower whisker.

Do: Both John and Bill had a lowest jump of $60$60 cm.

The box plots show the monthly profits (in thousands of dollars) of two derivatives traders over a year.

Ned |
5 10 15 20 25 30 35 40 45 50 55 60 |

Tobias |
5 10 15 20 25 30 35 40 45 50 55 60 |

Who made a higher median monthly profit?

Ned

ATobias

BNed

ATobias

BWhose profits had a higher interquartile range?

Tobias

ANed

BTobias

ANed

BWhose profits had a higher range?

Ned

ATobias

BNed

ATobias

BHow much more did Ned make in his most profitable month than Tobias did in his most profitable month?

The two box plots below show the data collected by the manufacturers on the life-span of light bulbs, measured in thousands of hours.

Complete the following table using the two box plots. Write each answer in terms of hours.

Manufacturer A Manufacturer B Median $\editable{}$ $\editable{}$ Lower Quartile $\editable{}$ $\editable{}$ Upper Quartile $\editable{}$ $\editable{}$ Range $\editable{}$ $\editable{}$ Interquartile Range $\editable{}$ $\editable{}$ Hence, which manufacturer produces light bulbs with the best lifespan?

Manufacturer A.

AManufacturer B.

BManufacturer A.

AManufacturer B.

B

The box plots below represent the daily sales made by Carl and Angelina over the course of one month.

0 10 20 30 40 50 60 70 Angelina's Sales |

0 10 20 30 40 50 60 70 Carl's Sales |

What is the range in Angelina's sales?

What is the range in Carl’s sales?

By how much did Carl’s median sales exceed Angelina's?

Considering the middle $50%$50% of sales for both sales people, whose sales were more consistent?

Carl

AAngelina

BCarl

AAngelina

BWhich salesperson had a more successful sales month?

Angelina

ACarl

BAngelina

ACarl

B

Plan and conduct investigations using the statistical enquiry cycle: A justifying the variables and measures used B managing sources of variation, including through the use of random sampling C identifying and communicating features in context (trends, relationships between variables, and differences within and between distributions), using multiple displays D making informal inferences about populations from sample data E justifying findings, using displays and measures.

Investigate a given multivariate data set using the statistical enquiry cycle

Investigate bivariate numerical data using the statistical enquiry cycle