Statistics

Lesson

Mean scores are often called "averages" in everyday life. Averages are used a lot, both inside and outside the classroom. For example, when your teacher gives you your test marks back, they often tell you the "average" mark so that you can get a sense of how you performed compared to the rest of your class. However, you have to be careful when comparing scores to the mean because they can sometimes be misleading.

Let's look at one example of this. Let's say that your maths class has $10$10 students in it, and these are everyone's test scores:

Student | Score |
---|---|

Alice | $70$70 |

Bob | $65$65 |

Charlie | $60$60 |

Daniel | $55$55 |

Emily | $50$50 |

Frank | $61$61 |

Geoffrey | $57$57 |

Harry | $48$48 |

Isobel | $72$72 |

You | $63$63 |

Let's calculate the average mark for this class, using the mean. To do this, we add all the scores up, and then divide by $10$10.

$Mean$Mean |
$=$= | $\frac{70+65+60+55+50+61+57+48+72+63}{10}$70+65+60+55+50+61+57+48+72+6310 |

$=$= | $60.1%$60.1% |

Since the class average was around $60%$60%, and your mark is $63%$63%, you'd probably be feeling pretty good. You're above average!

However, the next day let's say a new student comes to your class. That student is Terence Tao (陶哲軒), an Australian mathematician of Hong-Kong ancestry who has been described as the smartest person in the world. When Terence was $8$8 years old, he was teaching calculus to high school students, and he started university when he was just $14$14 years old. In 2014, he won a $\$3$$3 million prize for his groundbreaking discoveries in mathematics. Needless to say, Terence would find your test fairly easy. Let's say everyone gets the same score in the next test, except Terence, who gets $100%$100%. Now how would your classes scores look?

Student | Score |
---|---|

Alice | $70$70 |

Bob | $65$65 |

Charlie | $60$60 |

Daniel | $55$55 |

Emily | $50$50 |

Frank | $61$61 |

Geoffrey | $57$57 |

Harry | $48$48 |

Isobel | $72$72 |

You | $63$63 |

Terence | $100$100 |

Now what is the average? Let's add them up and see.

$Mean$Mean |
$=$= | $\frac{70+65+60+55+50+63+57+48+72+100}{11}$70+65+60+55+50+63+57+48+72+10011 |

$=$= | $64.6%$64.6% |

Now your score of $63%$63% is below average. Oh no! But really, if we think about it, it doesn't make sense to be disappointed. You did just as well in the test, and just because there is a genius in same class as you, it shouldn't change how you look at your mark.

These kinds of situations, where one abnormally high result changes the mean significantly, are called "outliers". They are one of the biggest problems with using the mean as an average.

Let's say that another new student comes, John von Neumann. A Hungarian mathematician described by many as the smartest person of the 20th century, when John was only $6$6 years old he could divide $8$8 digit numbers in his head and memorise pages of phone books. When he grew up he made huge discoveries in mathematics, physics and computing. What would happen if John also joined your class and got $100%$100% on your test?

One way to get around this problem of outliers changing your average too much is to use the "median" as the average instead. In this case, the median is the "middle person" in our class. To figure this out, we line the scores up in ascending order and pick the middle one.

What would the median be for your original class of $10$10 people?

What would the median be for the class with Terence and John in it?

How does the change in the median when the two geniuses were added compare with the change in the mean?

If your teacher uses the mean as the average for your class, you can make everyone above average using this simple trick. Just tell your teacher that there are a few "new students" in your class on the day of the test, and dress up a few sacks of sand in a school uniform (with a smiley face painted on), and sit them in front of a desk.

The sacks of sand will all get $0%$0% on their test (hopefully). Let's have a look at what would happen to the mean in the class we looked at before (without the geniuses).

Student | Score |
---|---|

Alice | $70$70 |

Bob | $65$65 |

Charlie | $60$60 |

Daniel | $55$55 |

Emily | $50$50 |

Frank | $63$63 |

Geoffrey | $57$57 |

Harry | $48$48 |

Isobel | $72$72 |

You | $63$63 |

Sack of Sand #1 | $0$0 |

What would be the mean for this class?

How many sacks of sand would you need to make all the student's scores be above the class mean?

How many sacks of sand do you think you would need for this to work in your real class at school?

Would this work if your teacher uses the median as the average?

Imagine a country where the average income is over $\$1000000$$1000000 (US) per year. Would you want to live there? Sounds pretty good, doesn't it!

However, if this "average" is the mean, you have to be careful with this statistic. For example, let's say we have a country made up of $1000000$1000000 people. $999900$999900 of these people are incredibly poor. Since the UN defines poverty as anyone living on less than $\$1$$1 per day, which is $\$365$$365 per year, we'll say that these people all make $\$300$$300 per year. The other $100$100 of these people are all as rich as Bill Gates, the richest man in the world, having around $\$80000000000$$80000000000 (that's $80$80 billion US dollars) each.

What would the mean income for this country be? Would you still want to live there?

This sort of situation is called a "bimodal distribution".

Normally, what you expect is a nice smooth distribution of poor and rich people, where the mean lies in the middle, like this:

However, in a bimodal distribution, you end up with two "humps" of people, like this:

In this case, the mean lies in-between the two groups. Since in this case no-one actually has the "mean" as their income, it seems silly to call it "average"!

What would the median be in this country? Is this a better measure of "average" in this country?

The way in which the average can be affected by outliers and bimodal distributions may seem like something you would only use in your maths class, but in fact you can see examples of it everywhere. Here are a few:

The average life expectancy in Ancient China (as well as Ancient Rome and Ancient Greece) was around $35$35 years old. If that is the case, why do we hear about so many elderly people in history class? See if you can do some research and figure it out!

According to the most recent census, the mean income in the US is $\$69821$$69821 (US dollars), but the median income is only $\$50502$$50502. Why are these two numbers different? Which is more representative of the "average" American?