topic badge

7.06 Describe distributions

Worksheet
Symmetry and skew
1

The table shows the number of crime novels in a bookshop for different price ranges rounded off to the nearest \$ 5:

a

Plot this data as a histogram.

b

Describe the shape of the distribution of the data.

Price of crime novelFrequency
\$55
\$1010
\$1517
\$208
\$2517
\$3010
\$355
2

Describe the shape of the data in the following graphs:

a
b
c
d
Leaf
16\ 7\ 7
22\ 2\ 2\ 2\ 3\ 3\ 3
33\ 3\ 3\ 6\ 6\ 6\ 7\ 7\ 7\ 7\ 7
44\ 4\ 4\ 4\ 4\ 4
57\ 7

Key: 2 \vert 3 = 23

e
f
g
h
i
3

If a set of data is strongly positively skewed and the median is 70, what can we conclude about the mean?

4

A die is rolled for a large number of trials and the number appearing is noted.

Which histogram would you expect to match the data? Explain your answer.

A
B
5

A pair of dice are rolled and the numbers appearing on the uppermost face are added to create a score.

a

How many combinations would result in a score of:

i

4

ii

7

iii

10

b

A pair of dice are rolled for a large number of trials and the numbers appearing are added to create a score. Draw a sketch of a histogram you would expect to match the data.

Connect histograms and box plots
6

Construct a box plot for the following histograms:

a
b
c
d
e
f
7

Match the histograms on the left to the corresponding box plots on the right:

\text{}\\
\text{}\\
Box Plot 1
10
20
30
40
50
60
70
80
90
\text{}\\\text{}\\\text{}\\\text{}\\
Box Plot 2
0
1
2
3
4
5
6
7
8
9
10
\text{}\\\text{}\\\text{}\\
Box Plot 3
0
1
2
3
4
5
6
7
8
9
10

Histogram A

Histogram B

Histogram C

\text{}\\
Box Plot 4
1
2
3
4
5
6
7
8
9
\text{}\\\text{}\\\text{}\\\text{}\\
Box Plot 5
0
10
20
30
40
50
60
70
80
90
100
\text{}\\\text{}\\\text{}\\
Box Plot 6
0
10
20
30
40
50
60
70
80
90
100

Histogram D

Histogram E

Histogram F

8

State whether the following pairs of histograms and box plots match with respect to their shape:

a
b
c
d
e
f
9

Explain why the following pairs of histograms and box plots do not match:

a
b
Modality, clusters and outliers
10

Identify any outliers in each of the following data sets:

a
73,\, 77,\, 81,\, 86,\, 131
b
7,\, 25,\, 28,\, 35,\, 42
c
69,\, 79,\, 86,\, 72,\, 86,\, 77,\, 73,\, 82,\, 81,\, 76,\, 83,\, 47,\, 87,\, 70,\, 80,\, 85
d
58,\, 63,\, 58,\, 59,\, 64,\, 68,\, 68,\, 30,\, 73,\, 25,\, 72,\, 61,\, 65,\, 69,\, 75,\, 72
e
Leaf
12
27\ 7\ 9\ 9
31\ 3\ 3\ 3\ 3\ 5\ 8
44\ 4\ 5

Key: 5 | 2 \ = \ 52 hours

f
11

For each of the following data sets, calculate:

i

The interquartile range

ii

The value of the lower fence

iii

The value of the upper fence

a
\text{Minimum}5
\text{Q}16
\text{Median}12
\text{Q}317
\text{Maximum}28
b
2
4
6
8
10
12
14
16
18
12

Consider the given dot plot:

a

Find the:

i

Median

ii

Lower quartile

iii

Upper quartile

iv

Interquartile range

v

Value of the lower fence

vi

Value of the upper fence

b

Identify any outliers.

13

For each of the following sets of data:

i

Construct the five-number summary.

ii

Calculate the interquartile range.

iii

Calculate the value of the lower fence.

iv

Calculate the value of the upper fence.

v

Would the value -5 be considered an outlier?

vi

Would the value 16 be considered an outlier?

a

9,\, 5,\, 3,\, 2,\, 6,\, 1

b

3,\, 10,\, 9,\, 2,\, 7,\, 5,\, 6

c

12,\, 5,\, 11,\, 1,\, 9,\, 8,\, 5,\, 6

14

For each of the following sets of data:

i

Construct the five-number summary.

ii

Would the value -3 be considered an outlier?

iii

Would the value 15 be considered an outlier?

a

1,\, 4,\, 8,\, 10,\, 6,\, 2,\, 5

b

9,\, 4,\, 6,\, 11,\, 10,\, 8,\, 10

15

For each of the data sets below:

i

Construct the five-number summary.

ii

Calculate the value of the lower fence.

iii

Calculate the value of the upper fence.

iv

Identify any outliers.

v

Create a box plot of the data with the outlier(s) displayed separately.

a
6.8,\, 4.0,\, 3.5,\, 5.1,\, 2.4,\, 1.6,\, 3.9,\, 3.5,\, 3.1,\, 3.6,\, 7.6,\, 3.7,\, 4.0,\, 5.1,\, 3.6,\, 3.8,\, 3.6,\, 6.7
b
10,\, 15,\, 12,\, 26,\, 18,\, 15,\, 11,\, 38,\, 25,\, 12,\, 19,\, 17,\, 16,\, 17,\, 11,\, 36,\, 9,\, 2,\, 21,\, 18,\, 16
c
82,\, 87,\, 92,\, 76,\, 80,\, 85,\, 71,\, 84,\, 61,\, 79,\, 81,\, 81,\, 86,\, 97,\, 101,\, 80,\, 71,\, 76,\, 78,\, 86,\, 84
16

Consider the data sets below:

  • Set A: \, 14,\, 18,\, 21,\, 19,\, 12,\, 16,\, 22,\, 20,\, 19,\, 13,\, 21,\, 20,\, 16,\, 7,\, 18,\, 20,\, 11,\, 19,\, 17,\, 24

  • Set B: \, 17,\, 9,\, 15,\, 24,\, 14,\, 13,\, 16,\, 10,\, 21,\, 14,\, 15,\, 17,\, 16,\, 13,\, 9,\, 19,\, 14,\, 18,\, 15,\, 12

a

Construct the five-number summary for each set.

b

Identify any outliers and use statistical calculations to justify your answer.

c

Create a parallel box plot of the data sets with the outlier(s) displayed separately.

17

The data point 5 is below the lower fence and is considered an outlier. The interquartile range is 12.

Find the smallest integer value the lower quartile can be.

18

The data point 37 is above the upper fence and is considered an outlier. The interquartile range is 10.

Find the largest integer value the upper quartile can be.

19

A group in a study take a test to assess their reaction time. The participants clicked a button as soon as they heard a sound which was played at random intervals. The reaction time, in milliseconds, of each participant is shown below:

220,\, 280,\, 210,\, 220,\, 215,\, 180,\, 185,\, 190,\, 190,\, 195,\, 150 \, 190,\, 195,\, 195
a

Construct the five-number summary.

b

Identify any outliers and use statistical calculations to justify your answer.

c

Create a box plot of the data with the outlier displayed separately.

d

Give a possible explanation for the outliers present.

20

\text{VO}_{2} Max is a measure of how efficiently your body uses oxygen during exercise. The more physically fit you are, the higher your \text{VO}_{2} Max.

Here are some people’s results when their \text{VO}_{2} Max was measured:

46,\, 27,\, 32,\, 46,\, 30,\, 25,\, 41,\, 24,\, 26,\, 29,\, 21,\, 21,\, 26,\, 47,\, 21,\, 30,\, 41,\, 26,\, 28,\, 26,\, 76

a

Sort the values into ascending order.

b

Determine the median \text{VO}_{2} Max.

c

Determine the upper quartile value.

d

Determine the lower quartile value.

e

Calculate 1.5 \times IQR, where IQR is the interquartile range.

f

Identify any outliers using upper and lower fences.

g

Create a box plot of the data with the outlier displayed separately.

h

An average untrained healthy person has a \text{VO}_{2} Max between 30 and 40.

Using the boxplot, what level of exercise is likely to describe the majority of people in this group?

21

The table shows the average temperature (\degree \text{C}) in a particular city over several years. Identify the year(s) in which the temperature is an outlier.

Year2002200320042005200620072008200920102011
Temp. (°C)31.726.522.622.524.223.024.121.123.326.0
22

Consider the stem plot below:

a

Are there any outliers? If so, state the value.

b

Is there any clustering of data? If so, in what interval?

c

What is the mode?

d

Describe the shape of the data.

Leaf
05
17\ 8
20\ 8
31\ 3\ 3\ 7\ 8\ 9
41\ 3\ 5\ 8\ 8\ 8
5
6
7
8
92

Key: 2 \vert 3 = 23

23

The number of hours worked per week by a group of people is represented in the following Stem and Leaf Plot:

a

Are there any outliers? If so, state the value.

b

Is there any clustering of data? If so, in what interval?

c

What is the mode?

Leaf
02
1
20\ 3\ 6\ 6
31\ 4\ 5\ 6\ 6\ 7
40\ 4\ 6\ 7\ 9
50

Key: 2 \vert 3 = 23

24

The shoe sizes of all the students in a class were measured and the data was presented in a bar graph.

a

Are there any outliers? If so, state the value.

b

Is there any clustering of data? If so, in what interval?

c

What is the modal shoe size?

d

Describe the shape of the distribution.

25

Consider the dot plot below:

a

Are there any outliers?

b

Is there any clustering of data?

c

State the modal score(s).

d

Describe the shape of the distribution of the data.

26

Consider the data shown in the histogram:

a

Are there any outliers? If so, what is the value?

b

Is there any clustering of data? If so, in what interval?

c

What is the mode?

d

Describe the shape of the distribution of the data.

27

Temperatures were recorded over a period of time and presented as a dot plot:

a

Are there any outliers?

b

Is there any clustering of data? If so, in what interval?

c

What is the modal temperature?

d

Describe the shape of the distribution of the data.

28

Consider the histogram given:

a

Describe the shape of the distribution

b

Determine the lower quartile score and the upper quartile score.

c

Hence, calculate the interquartile range.

d

Using the interquartile range, determine whether there are any outliers in the data set.

29

Consider the given dot plot:

a

Describe the shape of the distribution.

b

Determine the lower quartile score and the upper quartile score.

c

Hence, calculate the interquartile range.

d

Using the interquartile range, determine whether there are any outliers in the data set. If there are, find the value of the outlier(s).

30

The stem and leaf plot below shows the age of people to enter through the gates of a concert in the first 5 seconds:

a

What was the median age?

b

What was the difference between the lowest age and the median?

c

What is the difference between the highest age and the median?

d

What was the mean age? Round your answer to two decimal places.

e

Is the data positively or negatively skewed?

Leaf
10\ 1\ 2\ 2\ 3\ 3\ 4\ 4\ 4\ 8\ 8\ 8
21\ 7
34\ 5\ 5
40
54

Key: 1 | 2 \ = \ 12 years old

31

Consider the histogram representing students' heights in centimetres:

a

Does the histogram most likely represent grouped data or individual scores?

b

Estimate the value of the mean to one decimal place.

32

Estimate the value of the mean of the following data set:

33

The mercury levels in 38 fishing lakes were tested and recorded in the histogram:

a

Is the distribution uni-modal, bi-modal, or multi-modal?

b

State the modal class.

34

The percentage of faulty computer chips in 42 batches were recorded in the histogram given:

a

Is the distribution uni-modal, bi-modal, or multi-modal?

b

State the modal classes.

35

The temperature in a classroom at 1 pm every day was measured and recorded in the histogram below:

a

Is the distribution uni-modal, bi-modal, or multi-modal?

b

State the modal classes.

36

The number of peanuts in mixed nut packets were sampled and recorded in the following stem plot:

a

Complete the following frequency distribution table:

ScoreFrequency
40-49
50-59
60-69
70-79
80-89
90-99
100-109
110-119
Leaf
43\ 6\ 8
51\ 2\ 2
66\ 7\ 7\ 8\ 8
70\ 0\ 3\ 3\ 4\ 5\ 9
81\ 1\ 1\ 4\ 6\ 8\ 8\ 9
90\ 2\ 4\ 5\ 6\ 9
101\ 2\ 3\ 5\ 5\ 6\ 7\ 8
110\ 4\ 5\ 7

Key: 2 \vert 3 = 23

b

Is the distribution uni-modal, bi-modal, or multi-modal?

c

State the modal class or classes.

37

The reaction time of drivers was tested and recorded in the dot plot below:

a

Construct a frequency distribution table for the individual data values.

b

Is the distribution uni-modal, bi-modal, or multi-modal?

c

State the mode(s).

Effects of outliers
38

The number of three-pointers scored in a basketball game are shown in the dot plot:

The mode is 2, if the outlier is removed what is the new mode?

39

Consider the given stem plot:

If the outlier is removed what is the new mean? Round your answer to two decimal places.

Leaf
34\ 4\ 9
46\ 6\ 8\ 9
51\ 4
6
7
84

Key: 2 \vert 3 = 23

40

Consider the given stem plot:

If the outlier is removed find the new range.

Leaf
25
3
49\ 9
50\ 0\ 4\ 5\ 7
62\ 6

Key: 1 | 2 \ = \ 12

41

Consider the following frequency table:

If the outlier is removed what is the new mean? Round your answer to two decimal places if needed.

Weight in kilogramsFrequency
122
135
141
152
160
170
181
42

Consider the following frequency table:

If the outlier is removed what is the new mode?

Weight in kilogramsFrequency
141
150
160
173
186
194
202
43

The glass windows for an airplane are rolled to a certain thickness, but machine production means there is some variation. The thickness of each pane of glass produced is measured (in millimetres), and the dot plot shows the results.

a

The current median is 11.15. If the outlier is removed what is the new median?

b

The current mean is 11.1. If the outlier is removed what is the new mean? Round your answer to two decimal places.

44

For each of the following sets of data:

i

Find the mean, median, mode, and range. Round your answers to two decimal places where necessary.

ii

Identify the outlier.

iii

Remove the outlier from the set and recalculate the values found in part (i).

iv

Describe how each of the four statistics changed after removing the outlier.

a
53, \, 46,\, 25,\, 50,\, 30,\, 30,\, 40,\, 30,\, 47,\, 109
b
4.7,\, 2.8,\, 1.9,\, 0.9,\, 0.9,\, 2.2,\, 2.2,\, 1.2,\, 1.5,\, 0.9
c
4700,\, 4700,\, 4700,\, 4500,\, 5300,\, 4900,\, 5200,\, 4800,\, 1500,\, 5100
45

True or False: When the outlier is removed from a set of data, the range will always decrease.

46

For each of the following scenarios, determine whether the outlier that was removed must have had a value smaller or larger than the values that remain:

a

A set of data has an outlier removed and the mean lowers.

b

A set of data has an outlier removed and the mean rises.

c

A set of data has an outlier removed and the median lowers.

d

A set of data has an outlier removed and the median rises.

47

When an outlier is removed from a data set, describe the effect on the following:

a
Mode
b
Range
c
Mean
d
Median
48

The selling price of recently sold houses are:

\$467\,000,\, \$413\,000,\, \$410\,000,\, \$456\,000,\, \$487\,000,\, \$929\,000

a

Find the mean selling price, to the nearest thousand dollars.

b

Which of the selling prices raises the mean so that it is not reflective of most of the prices?

c

Recalculate the mean selling price excluding this outlier.

49

Seven millionaires with an average net wealth of \$41 million with a standard deviation of \$7 million are having a party. Suddenly Carlos Slim, who has a net wealth estimated to be \$31 billion, walks into the room.

a

What is the new average net wealth (in millions) in the room? Give your answer rounded to the nearest million.

b

When Carlos Slim's net worth is taken into account, will the standard deviation be higher, lower or unchanged from before?

c

Will the mode be higher, lower or unchanged from before if at least two of the millionaires have the same net wealth?

d

Will the range be higher, lower or unchanged from before?

Suitability of measures of centre
50

State the measure of centre which more accurately describes the centre of each of the following data sets:

a
12,\, 15,\, 16,\, 21,\, 22,\, 25
b
15,\, 13,\, 16,\, 17,\, 15,\, 15,\, 15
c
8,\, 10,\, 14,\, 18,\, 19,\, 91
51

The selling price of recently sold houses is given below:

\$760\,000,\, \$650\,000,\, \$810\,000,\, \$780\,000,\, \$760\,000,\, \$590\,000,\, \$1\,360\,000

a

Find the mean selling price. Round your answer to the nearest thousand dollars.

b

Find the median selling price.

c

Recalculate the mean selling price excluding the outlier.

d

Recalculate the median selling price excluding the outlier.

e

Which measure of centre best identifies the typical selling price of recently sold houses? Explain your answer.

52

The weight of fish caught in a "weigh and release" fishing competition, in kilograms are given below:

12.5,\, 15.1,\, 13,\, 14.2,\, 14.5,\, 14.9,\, 12.5,\, 14.3,\, 1.5

a

Find the mean weight.

b

Find the median weight.

c

Recalculate the mean weight excluding the outlier.

d

Recalculate the median weight excluding the outlier.

e

Which measure of centre best identifies the typical fish weight? Explain your answer.

53

The number of minutes spent studying by each of a group of students is given:

85,\, 85,\, 95,\, 70,\, 60,\, 70,\, 105

Which measure of centre more accurately describes the centre of this data set?

54

A netball team has so far played 8 games in the season. Their final score in each game is given:

36,\, 46,\, 49,\, 40,\, 52,\, 44,\, 39,\, 37

Which measure of centre is the best indicator of the team's final score in a game?

55

The number of songs Oprah has downloaded in each of the last seven months is given below:

6,\, 12,\, 18,\, 9,\, 0,\, 3,\, 15

Which measure of center is most representative of the number of songs downloaded each month?

56

Carl has been recording his spelling test scores for the past semester. His scores were:14,\, 16,\, 2,\, 15,\, 15,\, 16,\, 15.

a

Calculate the median of Carl's scores.

b

Calculate the mean of Carl's scores. Round your answer to two decimal places.

c

Which measure of centre more accurately describes the centre of this data set?

57

The number of animal races that were won by a trainer over the years are listed in the table. Which measure of centre should you use for the data? Explain your answer.

YearRaces won
2003116
2004105
2005102
2006108
2007113
58

The salaries of part-time employees at a company are given in the dot plot below. Which measure of centre best reflects the typical wage of a part-time employee? Explain your answer.

59

The age of students that participated in extra-curricular activities were recorded, and their results are presented in the dot plot below.

a

Find the mean age of participation among the sample of students.

b

Find the median age of participation among the sample of students.

c

By looking at the dot plot, are the mean and median reliable measures for the age of the typical student who participates in activities? Explain your answer.

60

A journalist wanted to report on road speed cameras being used as revenue raisers. She obtained data that showed the number of times 20 speed cameras issued a fine to motorists in one month. The results were:

101,\, 102,\, 115,\, 115,\, 121,\, 124,\, 127,\, 128,\, 130,\, 130,\\ 143,\, 143,\, 146,\, 162,\, 162,\, 163,\, 178,\, 183,\, 194,\, 977

The journalist wants to give the impression that speed cameras are just being used to raise revenue. Which measure of centre should she use in her article? Explain your answer.

61

The selling prices of artworks sold at an auction are given below:

\$18\,000,\, \$11\,000,\, \$17\,000,\, \$20\,000,\, \$18\,000,\, \$16\,000,\, \$15\,000,\, \$218\,000

Which measure of centre best identifies the typical selling price of recently sold artwork? Explain your answer.

Sign up to access Worksheet
Get full access to our content with a Mathspace account

Outcomes

2.3.1.6

describe the graphical displays in terms of the number of modes, shape (symmetric versus positively or negatively skewed), measures of centre and spread, and outliers and interpret this information in the context of the data

2.3.2.1

construct and use parallel box plots (including the use of the Q1 − 1.5 × IQR ≤ 𝑥 ≤ Q3 + 1.5 × IQR criteria for identifying possible outliers) to compare datasets in terms of median, spread (IQR and range) and outliers to interpret and communicate the differences observed in the context of the data

What is Mathspace

About Mathspace