1. Data Distributions

1.02 Categorical data

1.03 Numerical data

1.04 Measures of centre

INVESTIGATION: The problem with average

Lesson

1.05 Data distribution

1.06 Measures of spread

1.07 Box plots

1.08 Outliers

1.09 Compare sets of data

Book a Demo

Australia VIC

VCE 11 General 2023

INVESTIGATION: The problem with average

Lesson

Mean scores are often called "averages" in everyday life. Averages are used a lot, both inside and outside the classroom. For example, when your teacher gives you your test marks back, they often tell you the average mark so that you can get a sense of how you performed compared to the rest of your class. However, you have to be careful when comparing scores to the mean because they can sometimes be misleading.

A genius in your classroom

Let's look at an example of this. Suppose that your maths class has 10 students in it, and these are everyone's test scores:

Student	Score
Alice	70
Bob	65
Charlie	60
Daniel	55
Emily	50
Frank	61
Geoffrey	57
Harry	48
Isobel	72
You	63

Let's calculate the average mark for this class, using the mean. To do this, we add all the scores up, and then divide by the number of students 10.

Mean	=	\frac{70 + 65 + 60 + 55 + 50 + 61 + 57 + 48 + 72 + 63}{10}
	=	60.1\%

Since the class average was around 60\%, and your mark is 63\%, you'd probably be feeling pretty good. You're above average!

However, the next day a new student comes to your class. That student is Terence Tao (陶哲軒), an Australian mathematician of Hong-Kong ancestry who is an incredibly gifted mathematician. When Terence was 8 years old, he was teaching calculus to high school students, and he started university when he was just 14 years old. In 2014, he won a \$3 million prize for his groundbreaking discoveries in mathematics. Needless to say, Terence would find your test fairly easy. Let's say everyone gets the same score in the next test, and Terence gets 100\%. Now your class's test scores look like this:

Student	Score
Alice	70
Bob	65
Charlie	60
Daniel	55
Emily	50
Frank	61
Geoffrey	57
Harry	48
Isobel	72
You	63
Terence	100

Now what is the average? Let's add them up and see.

Mean	=	\frac{70 + 65 + 60 + 55 + 50 + 63 + 57 + 48 + 72 + 100}{11}
	=	64.6\%

Now your score of 63\% is below average. Oh no! But really, if we think about it, it doesn't make sense to be disappointed. You did just as well in the test, and just because there is a genius in same class as you, it shouldn't change how you look at your mark.

These kinds of situations, where one abnormally high result changes the mean significantly, are called outliers. They are one of the biggest problems with using the mean to describe a set of data.

Question

Let's say that another new student comes, John von Neumann. A Hungarian mathematician described by many as the smartest person of the 20th century, when John was only 6 years old he could divide 8-digit numbers in his head and memorise pages of phone books. When he grew up, he made huge discoveries in mathematics, physics and computing. What would happen if John also joined your class and got 100\% on your test?

One way to get around this problem of outliers affecting the mean is to use the median when describing data. The median is the middle value and therefore is not affected by very low or very high values (outliers). In this case, the median is the middle person in our class. To figure this out, we line the scores up in ascending order and pick the middle one.

Questions

What was the median for the original class of 10 people?
What would the median be for the class with both Terence and John in it?
How does the change in the median compare with the change in the mean?

A secret plan to make everyone in your class above average

If your teacher uses the mean as the average for your class, you could make everyone above average using this simple trick. Tell your teacher that there are a few "new students" in your class on the day of the test, dress up a few sacks of sand in a school uniform, and sit them in front of a desk.

The sacks of sand will all get 0\% on their test (hopefully). Here are the scores of the original class, plus one sack of sand:

Student	Score
Alice	70
Bob	65
Charlie	60
Daniel	55
Emily	50
Frank	63
Geoffrey	57
Harry	48
Isobel	72
You	63
Sack of Sand #1	0

Questions

What would be the mean for this class?
How many sacks of sand would you need to make all the student's scores be above the class mean?
How many sacks of sand do you think you would need for this to work in your real class at school?
Would this work if your teacher used the median to describe the data rather than the mean?

A country of millionaires

Imagine a country where the mean income is over \$1000000 (US) per year. Would you want to live there? Sounds pretty good, doesn't it!

However, if this is the mean, you have to be careful with this statistic. For example, let's say we have a country made up of 1000000 people. 999900 of these people are incredibly poor. Since the UN defines poverty as anyone living on less than \$1 per day, which is \$365 per year, we'll say that these people all make \$300 per year. The other 100 of these people are all as rich as Bill Gates, having around \$80000000000 (that's 80 billion US dollars) each.

Question

What would the mean income for this country be? Would you still want to live there?

This sort of situation is called a bimodal distribution.

Normally, what we expect is a nice smooth distribution of poor and rich people, which looks more like a normal distribution:

A bimodal distribution is different and has two main clusters of scores, like this:

In this case, the mean lies between the two groups. Since in this case, no one actually has the mean as their income, it seems a little strange to call it "average"!

Question

What would the median be in this country? Is this a better measure of centre for this country?

Extension

The average life expectancy in Ancient China (as well as Ancient Rome and Ancient Greece) was around 35 years old. If that is the case, why do we hear about so many elderly people in history class? See if you can do some research and figure it out!
According to the 2014 census, the mean income in the US was \$75738 (US dollars), but the median income was only \$55613. Why are these two numbers different? Which is more representative of the "average" American?

Outcomes

U1.AoS1.4

mean 𝑥 and sample standard deviation s

U1.AoS1.5

construct and interpret graphical displays of data, and describe the distributions of the variables involved and interpret in the context of the data

U1.AoS1.7

construct and use parallel boxplots or back-to-back stem plots (as appropriate) to compare the distribution of a numerical variable across two or more groups in terms of centre (median), spread (range and IQR) and outliers, interpreting any observed differences in the context of the data

INVESTIGATION: The problem with average

A genius in your classroom

Question

Questions

A secret plan to make everyone in your class above average

Questions

A country of millionaires

Question

Question

Extension

Outcomes

U1.AoS1.4

U1.AoS1.5

U1.AoS1.7

What is Mathspace

About Mathspace