12. Univariate Data

Lesson

Worksheet

Practice

12.02 Create and interpret frequency tables, histograms and polygons

12.03 Grouped data

12.04 Create and interpret cumulative frequency tables and polygons

12.05 Spread of data

12.06 Create and interpret box and whisker

12.07 The problem with average (Investigation)

12.08 Compare data sets

12.09 Recognising the shape of data

12.10 Outliers

Book a Demo

Standard Level

12.01 Centre of data

Lesson

Worksheet

Practice

Lesson

Measures of central tendency attempt to summarise a set of data with a single value that describes the centre or middle of the scores.

The main measures of central tendency are the mean, median, and mode.

The mean

The mean is often described as the average of the numbers in a data set. It is defined as the sum of the scores divided by the number of scores.

The symbol for the mean of a sample is $\overline{x}$x, whilst the population mean is represented by the symbol $\mu$μ (Greek letter 'mu').

To calculate the mean, we add up all the scores in a data set, then divide this total by the frequency (the number of scores).

Hint

To find the sum of all the scores, we can either add up each individual score, or, if certain scores are repeated, we can add the products of the scores and their frequencies (that is, add the products $f\times x=fx$f×x=fx). We will see this in action in Example 2.

Now let's have a go at calculating the mean of data sets ourselves.

Worked examples

Example 1

Find the mean from the data in the stem plot below:

Stem	Leaf
$2$2	$3$3	$8$8
$3$3	$1$1	$1$1	$1$1
$4$4	$0$0	$3$3
$5$5	$0$0	$3$3	$8$8	$8$8
$6$6	$2$2	$2$2	$9$9
$7$7	$1$1	$8$8
$8$8	$3$3
$9$9	$0$0	$0$0	$1$1

Think: We can find the mean by adding up all of the scores, then dividing the total by the number of scores.

Do:

$\text{Mean }$Mean	$=$=	$\frac{23+28+3\times31+40+43+50+53+2\times58+2\times62+69+71+78+83+2\times90+91}{20}$23+28+3×31+40+43+50+53+2×58+2×62+69+71+78+83+2×90+9120
	$=$=	$\frac{1142}{20}$114220
	$=$=	$57.1$57.1

Example 2

A statistician has organised a set of data into the frequency table shown:

Score ($x$`x`)	Frequency ($f$`f`)
$44$44	$8$8
$46$46	$10$10
$48$48	$6$6
$50$50	$18$18
$52$52	$5$5

(a) Complete the frequency distribution table by adding the $fx$fx column:

Think: For each score ($x$x value), the value in the $fx$fx column will be that score multiplied by its frequency $f$f.

Score ($x$`x`)	Frequency ($f$`f`)	$fx$`fx`
$44$44	$8$8	$352$352
$46$46	$10$10	$460$460
$48$48	$6$6	$288$288
$50$50	$18$18	$900$900
$52$52	$5$5	$260$260
Totals	$47$47	$2260$2260

(b) Calculate the mean of this data set, correct to two decimal places.

Think: We calculate the mean by dividing the sum of the scores (that is, the total $fx$fx) by the number of scores (the total $f$f).

Do:

$\text{Mean }$Mean	$=$=	$\frac{2260}{47}$226047
	$=$=	$48.0851\ldots$48.0851…
	$=$=	$48.09$48.09 ($2$2 d.p.)

Practice questions

Question 1

Find the mean of the following scores:

$8$8, $15$15, $6$6, $27$27, $3$3.

Question 2

The mean of $4$4 scores is $21$21. If three of the scores are $17$17, $3$3 and $8$8, find the $4$4th score, $x$x.

Enter each line of working as an equation.

The median

The median is another measure of central tendency. It is one way of describing a value that represents the middle or the centre of a data set. The median (which kind of sounds like medium) is the middle score in a data set.

Remember!

The data must be ordered (usually in ascending order) before calculating the median.

Which term is in the middle?

Suppose we have five numbers in our data set: $4$4, $11$11, $15$15, $20$20 and $24$24.

The median would be $15$15 because it is the value right in the middle. There are two numbers on either side of it.

$4,11,\editable{15},20,24$4,11,15,20,24

If we have a larger data set, however, we may not be able to see straight away which term is in the middle. There are two methods we can use to help us work this out.

The "cross out" method

Once a data set is ordered, we can cross out numbers in pairs (one high number and one low number) until there is only one number left. Let's check out this process using an example. Here is a data set with nine numbers:

Check that the data is sorted in ascending order (i.e. in order from smallest to largest).

Cross out the smallest and the largest number, like so:

Repeat step 2, working from the outside in - taking the smallest number and the largest number each time until there is only one term left. We can see in this example that the median is $7$7:

Note that this process will only leave one term if there are an odd number of terms to start with. If there are an even number of terms, this process will leave two terms instead (if you cross them all out, you've gone too far)! To find the median of a set with an even number of terms, we can then take the mean of these two remaining middle terms.

The "counting terms" method

We can also work out which term will be the middle number by using the following formula:

Finding the median position

Let $n$n be the number of terms. Then the middle term is the $\frac{n+1}{2}$n+12th term.

So if we use the same set of nine numbers from the previous example, $1,1,3,5,7,9,9,10,15$1,1,3,5,7,9,9,10,15, we see that:

$\text{Middle term }$Middle term	$=$=	$\frac{9+1}{2}$9+12
	$=$=	$5$5

This means the fifth term will be the median: $1,1,3,5,\editable{7},9,9,10,15$1,1,3,5,7,9,9,10,15.

So again, we find that the median is $7$7.

Let's now try this with an even number of terms. Here is a data set with four terms: $8,12,17,20$8,12,17,20. This time, we get:

$\text{Middle term }$Middle term	$=$=	$\frac{4+1}{2}$4+12
	$=$=	$2.5$2.5th term

What does the "$2.5$2.5th term" mean? Well, just like when we used the "cross out" method, the $2.5$2.5th term means the average (mean) of the second and third terms. Again, remember that the data must be in order before counting along to the median position. So in this example, the median will be the average of $12$12 and $17$17.

$\text{Median }$Median	$=$=	$\frac{12+17}{2}$12+172
	$=$=	$14.5$14.5

Practice questions

Question 3

Consider the following scores:

$23,25,13,9,11,21,24,17,20$23,25,13,9,11,21,24,17,20

Sort the scores in ascending order.
Calculate the median.

QUESTION 4

Write down $4$4 consecutive odd numbers whose median is $40$40.

Write all solutions on the same line separated by a comma.

QUESTION 5

Determine the following using the histogram:

A bar graph that represents the distribution of scores. The x-axis is titled "Score" and ranges from $44$44 to $48$48 labeled in intervals of 1. The y-axis is titled "Frequency" and ranges from 5 to 25, labeled in major intervals of 5 and minor intervals of 1. The height of the column for score $44$44 is $20$20. The height of the column for score $45$45 is $17$17. The height of the column for score $46$46 is $6$6. The height of the column for score $47$47 is $11$11. The height of the column for score $48$48 is $13$13. The frequency of each score is in the graph but not explicitly labeled.

The total number of scores.
The median.

Question 6

Find the median from the frequency distribution table:

Score	Frequency
$23$23	$2$2
$24$24	$26$26
$25$25	$37$37
$26$26	$24$24
$27$27	$25$25

The mode

The mode is another measure of central tendency - that is, it's a third way of describing a value that represents the centre of the data set. The mode describes the most frequently occurring score. Remember that the word and the meaning start with the same two letters.

Let's say we ask $10$10 people how many pets they have. $2$2 people say no pets, $6$6 people say one pet and $2$2 people say they have two pets. What is the most common number of pets for people to have? In this case, the most common number is one pet, because the largest number of people $\left(\frac{6}{10}\right)$(610) had one pet. So the mode of this data set is $1$1.

Remember!

The mode is the most frequently occurring score

For data that has been grouped into class intervals, we no longer have a frequency for individual scores. Instead we have the frequency for each class interval. In these situations, we call the the class interval with the highest frequency, the modal class. So for grouped data, the modal class is the equivalent of the mode.

Remember!

For grouped data, the modal class is the class interval with the highest frequency.

Worked example

Example 3

A statistician organised a set of data into the frequency table shown:

Score ($x$`x`)	Frequency ($f$`f`)
$10$10	$18$18
$20$20	$10$10
$30$30	$26$26
$40$40	$18$18
$50$50	$15$15

Think: The mode is the score that occurs the most frequently.

Do: The highest number in the frequency column is $26$26. This corresponds to the score of $30$30, and therefore the mode is $30$30.

Practice questions

Question 7

Find the mode of the following scores:

$8,18,5,2,2,10,8,5,14,14,8,8,10,18,14,5$8,18,5,2,2,10,8,5,14,14,8,8,10,18,14,5

Mode = $\editable{}$

Question 8

Find the mode from the histogram shown.

12.01 Centre of data

The mean

Worked examples

Example 1

Example 2

Practice questions

Question 1

Question 2

The median

Which term is in the middle?

The "cross out" method

The "counting terms" method

Practice questions

Question 3

QUESTION 4

QUESTION 5

Question 6

The mode

Worked example

Example 3

Practice questions

Question 7

Question 8

What is Mathspace

About Mathspace