Language and Use of Statistics

Lesson

When you want to know a statistic about a population, there are two ways of going about finding it out, a *census *or a *sample*.

For example, let's say that the school is deciding to get some new tables and chairs, and they want to make sure that they are made so that the "average height" student will find them comfortable. They've asked you to figure out the average height of students in the school. Doing a census would mean getting the height of every single person in your school, and then averaging it. Doing a sample would mean getting the height of a smaller number of people (e.g. $50$50 people), and using their average height as an estimate of the average height of the people in the school. If you had to choose, which would you prefer to do, a census or a sample?

Most people would prefer to do a sample, because it is so much quicker and easier. However, doing a sample can have problems of it's own, if you happen to choose a bad group of people to use as your "average". For example, see if you can figure out what is wrong with using these groups of people to figure out the average height of all students in your schools:

1. Measure the heights of all the people in Year $7$7

2. Measure the heights of all the boys

3. Measure the heights of the first $50$50 people who walk out of the gate once school finishes

4. Measure the heights of the members of the school basketball team

All of these samples would be easier to do than a census, but would probably not get the right answer for the average height of students in the whole school. These are called *biased samples*, and need to be avoided at all costs when trying to figure out a statistic. There is no point in doing something quickly if you don't get the right answer! Biased samples are a serious problem, and can appear in a surprisingly large number of scenarios.

See if you can come up with a non-biased sample for the average height of students in your school your student population. and compare your methods with other students to find the best possible method.

Despite the difficulty of using samples, they are used frequently in the real world. For example, consider the Growth Charts published by the World Health Organisation. These charts show how heavy babies should weigh at a particular age. Have a look at the charts for girls and boys.

It is important that these charts are accurate, as they are used to help doctors identify babies who are unwell and need medical attention. Despite the importance of accuracy, a sample was used instead of a census.

Why do you think a census was not done in this case?

How could the following factors affect the results of the sample?

a. Number of babies

b. Site for recruiting babies

c. Method of recruiting the babies.

See if you can come up with your own "perfect method" for trying to get a result that is as accurate as possible for how much an "average baby" should weigh at a given age.

a. How many babies would you recruit?

b. Where would you recruit the babies?

c. How would you recruit the babies?

Have a talk with people around you and see how your methods compare.

Don't worry if your method doesn't seem perfect. It is actually incredibly difficult to make a sample work accurately for this kind of thing, and it requires highly trained statisticians and lots of complicated maths. If you want to see just how complicated, have a look at this 336 page document which outlines the method taken by the World Health Organisation.

Let's say the government wants to find out the average income in Australia, as well as more specific information about how many rich and poor people there are. Should we use a sample or a census?

**Think **about the advantages and disadvantages of using a sample

**Think **about the advantages and disadvantages of using a census

Income is a very difficult thing to measure with a sample, as it is very easy to end up with a biased sample. See if you can figure out why the following samples would be biased:

1. Asking $1000$1000 people in Edgecliff

2. Asking $1000$1000 people in Utopia

3. Calling $1000$1000 random people from the phonebook of state capitals

4. Calling $1000$1000 random mobile phone numbers

5. Checking the tax returns of $1000$1000 random workers

6. Checking the yearly bank statements of $1000$1000 random people

Now, see if you can come up with a "perfect sample" which would have minimum bias.

In the end, the government uses both samples and census to derive income data for Australia. In order to prepare the "Household Income and Income Distribution" publication, the Australian Bureau of Statistics uses many different sources, including three different surveys of around $10000$10000 households each and the data from the most recent census.

The reason for doing this is to combine the benefits of a sample with those of a census. A sample has the advantage of being relatively cheap and easy, which means they can be conducted frequently (for example, every year or every $3$3 months). This makes the information highly relevant and up-to-date. By contrast, a census is very expensive and time-consuming, and as a result is only conducted every $5$5 years. However, a census has the most accurate information, being relatively free from bias. By using the accuracy of the census information to double-check or modify the information from the samples, it is possible to obtain data which is highly accurate and timely.

Carry out investigations of phenomena, using the statistical enquiry cycle: A conducting surveys that require random sampling techniques, conducting experiments, and using existing data sets B evaluating the choice of measures for variables and the sampling and data collection methods used C using relevant contextual knowledge, exploratory data analysis, and statistical inference.

Design a questionnaire