Measures of central tendency attempt to summarise a set of data with a single value that describes the centre or middle of the scores.
The main measures of central tendency are the mean, median, and mode. Deciding which one is best depends on some other characteristics of the particular set of data, and we will look further into this in chapter 11.05.
The mean is often described as the average of the numbers in a data set. It is defined as the sum of the scores divided by the number of scores.
The symbol for the mean of a sample is $\overline{x}$x, whilst the population mean is represented by the symbol $\mu$μ (Greek letter 'mu').
To calculate the mean, we add up all the scores in a data set, then divide this total by the frequency (the number of scores).
To find the sum of all the scores, we can either add up each individual score, or, if certain scores are repeated, we can add the products of the scores and their frequencies (that is, add the products $f\times x=fx$f×x=fx). We will see this in action in Example 2.
Now let's have a go at calculating the mean of data sets ourselves.
Find the mean from the data in the stem plot below:
Stem | Leaf | |||
$2$2 | $3$3 | $8$8 | ||
$3$3 | $1$1 | $1$1 | $1$1 | |
$4$4 | $0$0 | $3$3 | ||
$5$5 | $0$0 | $3$3 | $8$8 | $8$8 |
$6$6 | $2$2 | $2$2 | $9$9 | |
$7$7 | $1$1 | $8$8 | ||
$8$8 | $3$3 | |||
$9$9 | $0$0 | $0$0 | $1$1 |
Think: We can find the mean by adding up all of the scores, then dividing the total by the number of scores.
Do:
$\text{Mean }$Mean | $=$= | $\frac{23+28+3\times31+40+43+50+53+2\times58+2\times62+69+71+78+83+2\times90+91}{20}$23+28+3×31+40+43+50+53+2×58+2×62+69+71+78+83+2×90+9120 |
$=$= | $\frac{1142}{20}$114220 | |
$=$= | $57.1$57.1 |
A statistician has organised a set of data into the frequency table shown:
Score ($x$x) | Frequency ($f$f) |
---|---|
$44$44 | $8$8 |
$46$46 | $10$10 |
$48$48 | $6$6 |
$50$50 | $18$18 |
$52$52 | $5$5 |
(a) Complete the frequency distribution table by adding the $fx$fx column:
Think: For each score ($x$x value), the value in the $fx$fx column will be that score multiplied by its frequency $f$f.
Score ($x$x) | Frequency ($f$f) | $fx$fx |
---|---|---|
$44$44 | $8$8 | $352$352 |
$46$46 | $10$10 | $460$460 |
$48$48 | $6$6 | $288$288 |
$50$50 | $18$18 | $900$900 |
$52$52 | $5$5 | $260$260 |
Totals | $47$47 | $2260$2260 |
(b) Calculate the mean of this data set, correct to two decimal places.
Think: We calculate the mean by dividing the sum of the scores (that is, the total $fx$fx) by the number of scores (the total $f$f).
Do:
$\text{Mean }$Mean | $=$= | $\frac{2260}{47}$226047 |
$=$= | $48.0851\ldots$48.0851… | |
$=$= | $48.09$48.09 ($2$2 d.p.) |
Find the mean of the following scores:
$8$8, $15$15, $6$6, $27$27, $3$3.
The mean of $4$4 scores is $21$21. If three of the scores are $17$17, $3$3 and $8$8, find the $4$4th score, $x$x.
Enter each line of working as an equation.
The median is another measure of central tendency. It is one way of describing a value that represents the middle or the centre of a data set. The median (which kind of sounds like medium) is the middle score in a data set.
The data must be ordered (usually in ascending order) before calculating the median.
Suppose we have five numbers in our data set: $4$4, $11$11, $15$15, $20$20 and $24$24.
The median would be $15$15 because it is the value right in the middle. There are two numbers on either side of it.
$4,11,\editable{15},20,24$4,11,15,20,24
If we have a larger data set, however, we may not be able to see straight away which term is in the middle. There are two methods we can use to help us work this out.
Once a data set is ordered, we can cross out numbers in pairs (one high number and one low number) until there is only one number left. Let's check out this process using an example. Here is a data set with nine numbers:
Note that this process will only leave one term if there are an odd number of terms to start with. If there are an even number of terms, this process will leave two terms instead (if you cross them all out, you've gone too far)! To find the median of a set with an even number of terms, we can then take the mean of these two remaining middle terms.
We can also work out which term will be the middle number by using the following formula:
Let $n$n be the number of terms. Then the middle term is the $\frac{n+1}{2}$n+12th term.
So if we use the same set of nine numbers from the previous example, $1,1,3,5,7,9,9,10,15$1,1,3,5,7,9,9,10,15, we see that:
$\text{Middle term }$Middle term | $=$= | $\frac{9+1}{2}$9+12 |
$=$= | $5$5 |
So again, we find that the median is $7$7.
Let's now try this with an even number of terms. Here is a data set with four terms: $8,12,17,20$8,12,17,20. This time, we get:
$\text{Middle term }$Middle term | $=$= | $\frac{4+1}{2}$4+12 |
$=$= | $2.5$2.5th term |
What does the "$2.5$2.5th term" mean? Well, just like when we used the "cross out" method, the $2.5$2.5th term means the average (mean) of the second and third terms. Again, remember that the data must be in order before counting along to the median position. So in this example, the median will be the average of $12$12 and $17$17.
$\text{Median }$Median | $=$= | $\frac{12+17}{2}$12+172 |
$=$= | $14.5$14.5 |
Consider the following scores:
$23,25,13,9,11,21,24,17,20$23,25,13,9,11,21,24,17,20
Sort the scores in ascending order.
Calculate the median.
Write down $4$4 consecutive odd numbers whose median is $40$40.
Write all solutions on the same line separated by a comma.
Determine the following using the histogram:
The total number of scores.
The median.
Find the median from the frequency distribution table:
Score | Frequency |
---|---|
$23$23 | $2$2 |
$24$24 | $26$26 |
$25$25 | $37$37 |
$26$26 | $24$24 |
$27$27 | $25$25 |
The mode is another measure of central tendency - that is, it's a third way of describing a value that represents the centre of the data set. The mode describes the most frequently occurring score. Remember that the word and the meaning start with the same two letters.
Let's say we ask $10$10 people how many pets they have. $2$2 people say no pets, $6$6 people say one pet and $2$2 people say they have two pets. What is the most common number of pets for people to have? In this case, the most common number is one pet, because the largest number of people $\left(\frac{6}{10}\right)$(610) had one pet. So the mode of this data set is $1$1.
The mode is the most frequently occurring score
For data that has been grouped into class intervals, we no longer have a frequency for individual scores. Instead we have the frequency for each class interval. In these situations, we call the the class interval with the highest frequency, the modal class. So for grouped data, the modal class is the equivalent of the mode.
For grouped data, the modal class is the class interval with the highest frequency.
A statistician organised a set of data into the frequency table shown:
Score ($x$x) | Frequency ($f$f) |
---|---|
$10$10 | $18$18 |
$20$20 | $10$10 |
$30$30 | $26$26 |
$40$40 | $18$18 |
$50$50 | $15$15 |
Think: The mode is the score that occurs the most frequently.
Do: The highest number in the frequency column is $26$26. This corresponds to the score of $30$30, and therefore the mode is $30$30.
Find the mode of the following scores:
$8,18,5,2,2,10,8,5,14,14,8,8,10,18,14,5$8,18,5,2,2,10,8,5,14,14,8,8,10,18,14,5
Mode = $\editable{}$
Find the mode from the histogram shown.