Did you know that by analysing a small sample that is representative of a population we can get a good idea of what the entire population is like? You have probably had experience with this before, whether knowingly or unknowingly. Consider the following scenario. You have just finished baking 100 cookies. Before you take them to your friend’s party, you have to check whether they taste good or not, so you decide to do a taste test. How many cookies should you eat to check? All of them? Of course not! If you ate all of them, there wouldn’t be any left for your friend’s party! It is enough to eat just a few of them because by doing this you are essentially using a sample to get an indication of what the whole population of 100 cookies tastes like.
What are some other instances of when samples may be used? Well, they can be used for measuring aspects of the economy such as how many people are unemployed at a given time, or gauging people’s opinions on a range of topics, as well as for testing water quality, testing for dangerous biological or radioactive hazards, etc. The list is endless.
In all these cases, the sample used must be representative of the whole population. For instance, if you wanted to taste test your batch of cookies, you’d have to taste both the burnt and non-burnt ones. Otherwise, if you’d tasted just the burnt ones you’d think that your entire batch of cookies tasted a bit off. If the sample you use is not representative, then it is said to be biased.
Let’s go through the concept of unrepresentative samples in a bit more detail. Samples can be unrepresentative either because of the way they were chosen or because of chance (bad luck). Choosing to taste only the burnt cookies was an instance of a sample being unrepresentative because of how it is chosen. What you should do instead is to select cookies at random (by closing your eyes when choosing) to taste. That way your sample will likely contain a mixture of both burnt and non-burnt ones.
But notice that I mentioned likely. This is because even if you chose a sample of cookies at random, the sample could have still been unrepresentative as a result of chance. For example, through bad luck, you could have picked just the burnt ones. As another example, consider a standard deck of cards. If you picked four cards at random from a deck that is face down on a table, you could quite possibly select four kings through luck. Obviously, this sample of four kings is not representative of the entire deck but this is the result of luck, not the result of the way you chose the sample of cards.
So just to recap: a sample can be unrepresentative because of how it is chosen or because of chance. But it is only said to be biased if it is unrepresentative due to how it is chosen.
Sometimes simply choosing members from a group at random may not be enough to create a representative sample. In other words, a simple random sample may not be a representative sample. (A random sample is one in which every member of the population is equally likely to be selected to be in the sample.) For example, imagine you want to know the level of support for a women's health club that the local council is considering building. So you decide to carry out a survey. Now suppose 90% of your town is female and 10% is male. If you select a random sample, the sample will not necessarily be in the same proportion of females to males as that in the population (i.e. in your town). Put simply, there isn’t a proper representation of each gender. You may be wondering why this is a problem? Well, it wouldn’t be a problem if a person’s gender has no effect on whether they support the building of the women's health club. But this is unlikely to be the case since the health club will be exclusive for women and so you’d think that women are more likely to favour the idea of building it than men. So for a sample to be representative in this instance, it has to include the same proportion of females and males as in the town. That is, 90% of people in the sample should be female and 10% should be male. Such a sample is called a stratifiedsample.
Let’s go through exactly how we would calculate the exact number of females and males in a stratified sample of 50. To find the number of females, we simply multiply 90% (the proportion of females in the town) by 50 (the sample size). So we get 90% x 50 = 45 females. For the number of males, we get 10% x 50 = 5. But how do we select these 45 females and 5 males? Well, we simply select 45 females at random from the town’s females and select the 5 males at random from the town’s males.
The aspect of the population that may be relevant when you select a stratified sample will vary depending on what you are trying to investigate. For example, if you want to find out what people think about the government’s new income tax policy, you would have to use a sample that properly represents people from each income bracket.
Sometimes you may have to categorise a population by more than one aspect. For example, you may have to categorize a population according to both gender and age. The table shows a breakdown of the population of students at ABC High School.
Age | |||||||
---|---|---|---|---|---|---|---|
12 | 13 | 14 | 15 | 16 | 17 | Total | |
Male | 76 | 80 | 78 | 76 | 83 | 77 | 470 |
Female | 74 | 75 | 69 | 71 | 65 | 76 | 430 |
Total | 150 | 155 | 147 | 147 | 148 | 153 | 900 |
If you want to calculate the number of students needed to represent each category, you have to follow the steps in the example above. Multiply the proportion that the group makes up by the sample size. For example, the number of 13-year-old girls that should be included in a sample of 100 is 75/900 x 100 = 8 (rounding to the nearest integer).
Let’s go through an in-depth example of the use of a sample in a real life context. Consider the following question. How many people watched the football game on TV on the weekend? How would you go about answering this question? While it would be theoretically possible for you to count the actual number of people tuned in to the game, this wouldn’t be practical given the large number of people who watch TV. This is where samples come in. Not only are they more efficient, but they also enable us to gain insight easily into the demographics of the viewing population such as age, gender, etc.
But how do television ratings companies create samples? How do they decide who is included in them?
Basically, a television ratings company creates a sample audience by selecting a random sample that is representative of all viewers. The exact process may vary slightly for each company, but the essential point is that the sample they gather has to reflect the demographics of the whole population. Once a sample of households has been selected, meters are then installed in their homes to track when their TVs are on and what programs they are watching. From this process, the company can track the number of people watching a particular program and can then project the result to the entire population. For example, if they wanted to know how many people watched a football match, they simply count the number of people in the sample audience that watched the match, and then generalise from this sample audience to estimate how many people in the whole population watched the match. Not only that, they can also estimate the age, gender, city and other characteristics of the viewers.
How many households do you think a ratings company has to include in the sample for it to be 99% confident of its predictions? Do you think it would be in the millions or in the hundreds of thousands? Or in the tens of thousands maybe? In reality, a sample size of 3000 people would yield a margin of error of approximately 1%.
Let’s imagine Mitch wants to find out the favourite colour of students in his maths class. He doesn’t want to spend a lot of time on this investigation so instead of asking every single person in his class, he decides to harness the power of sampling and ask only a few of them. He decides to only ask 8 of them. He knows that he’s supposed to select the students to be included in this sample “at random”, but how exactly does one go about doing this?
Should Mitch look at the class roll and simply pick out a few names? The answer is no, because this may be affected by subtle personal preferences that he wouldn’t be aware of and the sample would then not be representative. The people he selects might, for example, be the ones he is friends with, or the ones he hate the most, or the ones that he doesn't know so well. As you can tell, the sample would then not be representative and the results would then be misleading. If the sample was made up of only avid Manchester United football fans, for example, then he might mistakenly think that the most popular colour among students was red.
A better way to select a random sample is to allocate a number to each student such as by allocating a number from 1 to 30 as we go down the class roll. It doesn’t matter how the numbers are allocated. We can then select students to be included in the sample by using any method that produces random numbers. One of the easiest methods would be to use the random number generator on a (scientific) calculator.
2850 | 3289 | 2934 | 6201 | 4773 |
4189 | 6380 | 1827 | 1269 | 6104 |
8661 | 2506 | 5929 | 6430 | 0293 |
3276 | 4302 | 1470 | 1736 | 8924 |
2539 | 0794 | 7365 | 1809 | 0472 |
But the method we will discuss in detail is the use of random number tables, which as the name suggests are just tables of random numbers. They are made up of the digits from 0 through 9. To use them, we need to select a starting point. It doesn’t matter where this is and it doesn’t matter whether numbers are chosen by going across rows, down columns, diagonally or any other weird way.
Here we describe one of many possible ways. Suppose we start at the intersection of the 3rd row and 3rd column and we decide to read across rows. To get 8 numbers (for Mitch’s sample of 8) at random from the table, we have to read from left to right from 5929 onwards and find two-digit numbers that are less than or equal to 30 (since the students in Mitch’s class have been allocated numbers from 1 to 30 only), keeping in mind to ignore repeated numbers.
Let’s go through this step-by-step. Reading from left-to-right, we get
59 29 64 30 02 93 32 76 43 02 14 70 17 36 89 24 25 39 07 94 73 65 18 09 04 72
We ignore the second “02” since it is a repeat, and we use only the numbers that are less than or equal to 30. This leaves
29 30 02 14 17 24 25 07 18 09 04
Since we are after 8 students to include in the sample, we only need the first 8 numbers. This leaves
29 30 02 14 17 24 25 07
We can then use these numbers to find the names of the students they correspond to.
1. Rob, Nick and Marcel are having an argument about what the most popular sport among students at their school is. Rob thinks it is football, Nick thinks it is netball, while Marcel thinks it is cricket. To settle the argument once and for all, they decide to carry out a survey. But instead of collecting the data together as a group, they each go out on their own. Rob goes and asks his teammates in the school football team, Nick asks members in the girls dancing class and Marcel asks everyone who is stuck in after-school detention with him. The following table shows the results of their investigation.
% football | % netball | % cricket | |
---|---|---|---|
Rob | 90 | 5 | 5 |
Nick | 0 | 80 | 20 |
Marcel | 30 | 30 | 40 |
i) Rob
ii) Nick
iii) Marcel
2. Britney is in charge of coming up with the town’s budget. One thing she is undecided on is whether to spend money on a new music hall for the town’s residents. So she decides to carry out a survey to gauge the residents’ support for building a new music hall. She is also unsure how to select the sample for this survey and so consults her husband who suggests the following methods. For each method, comment on whether the sample gathered would be representative and suggest changes that could be made to improve the method.
3. Imagine that you have been made in charge with investigating students’ satisfaction with the performance of your school principal. How would you go about selecting a representative sample from the entire student population? Consider who you will ask, the questions you will ask them, and where and when you will ask them.
4. Are the samples in the following instances representative samples? If they are not, who or what should have been included for them to have been representative?
5. Principal Chris is considering building a new basketball court for students to use during lunchtimes, but first he has to find out whether students support the idea. So he decides to personally interview students to get their thoughts on the idea. But due to time constraints he is only able to directly talk to 120 students.
The following table shows the number of students in each year group at a particular high school which has 900 students.
Year | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|
Number of students | 176 | 180 | 164 | 171 | 101 | 108 |
6. Principal Chris would also like to interview some students to find out the most popular sport among students at the school. But again, given his lack of time he is only able to talk with a few of them. Not only that, he does not even have the time to select the sample himself, so he asks his secretary Mabel to select a sample for him. Instead of selecting random students by year group as Chris did previously, Mabel selects a random sample that is not stratified. That is, she selects students from the entire school at random.
7. The table shows a breakdown of the population of students at ABC High School.
Age | |||||||
---|---|---|---|---|---|---|---|
12 | 13 | 14 | 15 | 16 | 17 | Total | |
Male | 76 | 80 | 78 | 76 | 83 | 77 | 470 |
Female | 74 | 75 | 69 | 71 | 65 | 76 | 430 |
Total | 150 | 155 | 147 | 147 | 148 | 153 | 900 |
How many students should be included from each of the 12 categories to form a representative sample of 90?
Imagine you have just won 5 boxes of chocolates as part of a competition. Unfortunately, you do not eat chocolate, so you decide to give away the 5 boxes to students in your class. You want to be fair and so you decide to select 5 students at random to give the boxes of chocolates to. Use the following list of random numbers to select a random sample of 5 students from your class.
4284 | 3762 | 5710 | 9375 | 0829 |
6235 | 1286 | 4208 | 1678 | 3014 |