Language and Use of Statistics

Obtaining Representative Data (Investigation)

Sampling Techniques

Questionnaire Design (investigation)

Pros and Cons of Samples (Investigation)

Pros and Cons of a Census (Investigation)

Statistics and the Weather (Investigation)

EXT: Choice of Graph for Data I (Investigation)

Tabulate and Graph Data (Investigation)

Sampling and Bias (Investigation)

Lesson

Sources of Bias

Life Expectancy (Investigation)

Book a Demo

Canada ON

Grade 11

Sampling and Bias (Investigation)

Lesson

In 1936 the US magazine Literary Digest ran a poll to predict who would win the US presidential election out of Alf Landon and Franklin Roosevelt. It sent out 10 million questionnaires and got 2,266,566 responses back. The results of the poll indicated a victory for Landon with 57% of the vote. But come the actual election, Roosevelt won in a landslide with 63% of the vote, with Landon receiving only 37%. How could Literary Digest get it wrong and get it wrong by such a large amount?

Meanwhile, George Gallup conducted a much smaller poll, comprised of only 50,000 people, and correctly predicted Roosevelt’s victory. Not only that, he also correctly predicted the incorrect result of the Literary Digest poll using a random sample smaller than theirs but chosen to match the demographics.

How Literary Digest got it all wrong

The moral here is that the sampling method is much more important than the sample size. There were two problems with the way the Literary Digest constructed their sample. First, they didn’t use a random sample and so the sample was not representative of the population. Instead, they selected their sample from car registrations, telephone directories, country club memberships and magazine subscriptions; and at that time right after the Great Depression, those people who had cars, phones, magazines, etc were more likely to be wealthy and Republicans.

But the selection bias produced by this particular sample was not as severe as the nonresponse bias. From the 10 million questionnaires sent out, they only got 2,266,566 back – a response rate of just under 23%. Those who had strong opinions, particularly those who were dissatisfied with Roosevelt’s performance as president and wanted change, were more likely to respond, while those who were satisfied were less inclined to complete the questionnaires.

These are just two of many biases that sampling methods can be subject to. A similar but related bias is the voluntary response bias that arises when it is the respondents who decide whether to join the sample. A good example of this is when TV and radio stations try to gauge public opinion by asking their viewers or listeners to call in or to participate in their online poll. But worse still are the reality TV shows like Australian Idol that ask viewers to vote for their favourite contestant. It is important to keep in mind that the winner is not necessarily the contestant that has the most number of admirers among viewers because the majority of people who vote tend to feel strongly about particular contestants and so are not representative of all the viewers. Furthermore, the problem is made worse by the fact that any person can vote an unlimited number of times.

To see the problem presented by the voluntary response bias, consider the following example. There was a US television poll that asked viewers “Do you support the President’s economic plan?” (The president at the time was Bill Clinton.) The table below shows the results of this poll and the results of a properly conducted survey by a market research company.

	Television Poll	Proper Survey
Yes	42%	75%
No	58%	18%
Not sure	0%	7%

As you can tell, there is a big difference between the two sets of numbers and this is due to the voluntary response bias in the television poll. The respondents themselves chose to be included in the sample and, as the results show, most of these respondents did not support the plan. Furthermore, there was no “Not sure” option available for respondents to choose, which only made the results of the television poll even more misleading.

Another kind of sampling that will lead to unrepresentative data is convenience sampling. With convenience sampling, those people that are easiest to recruit by the researcher are selected. A good example of this would be trying to gauge public opinion on an issue by asking a few of your friends for their thoughts since they are easy for you to get a hold of. Another example would be conducting a survey at a shopping centre.

A response bias occurs when the phrasing of questions leads people to give a response that doesn’t reflect their true beliefs. For example, a Roper poll in 1993 asked “Does it seem possible or does it seem impossible to you that the Nazi extermination of the Jews never happened?” The relatively high number of responses for “possible” shocked many people. But considering the complicated structure of the sentence, many of the people who responded may not have fully understood the question and hence may not have provided a response that reflected what they truly believed. This suspicion was tested when Roper conducted the same poll with a simpler, revised question: “Does it seem possible to you that the Nazi extermination of the Jews never happened, or do you feel certain that it happened?” The table below shows the results of the responses to both questions.

Original poll		Revised poll
Impossible	65%	Certain that it happened	91%
Possible	22%	Possible that it never happened	1%

Discussion

A local council wants to know people’s opinions about the mayor’s performance and so decides to hold a community forum, inviting members of the public to attend and voice their opinions.

What is the sampling frame used in this study?
What is the population of interest?
Is the sampling frame representative of the population of interest? If not, what is the potential bias that may arise from its use?
How will constructing a biased sampling frame lead to an inaccurate measure of community opinion?
How would you construct the sampling frame if you were in charge?

Discussion

As part of an assignment you are required to determine the most supported NRL team among students at your school.

Give an example of a method that would be subject to a voluntary response bias.
Give an example of a method that would be susceptible to a nonresponse bias.
Give an example of a method that would be subject to a selection bias.
Give an example of a method that would be free of any form of bias.

Discussion

Explain why the following samples are biased:

A survey is attached to a women's magazine and asks readers about their support for idea of unpaid maternity leave. Readers are asked to return the survey by mailing it back to the headquarters of the magazine’s publishers.
A television reporter gauges the popularity of the Australian and English cricket teams by standing outside the Sydney Cricket Ground during a test match between the two countries and asking people who enter or leave the ground about which team they support.
The NSW health minister attempts to measure the percentage of doctors that bulk-bill by contacting 100 randomly selected from the Yellow Pages.
An A Current Affair reporter approaches randomly selected people on a street in Sydney CBD to ask for their opinions on the latest decision by Fair Work Australia to increase the federal minimum wage.
A market research company contacts 100,000 randomly selected people by phone to participate in a survey, but only 1000 agree to participate.

Discussion

Following the proposal in the US by a politician to offer driver’s licenses to illegal immigrants, there was a poll on CNN asking viewers “Would you be more or less likely to vote for a presidential candidate who supports giving drivers’ licenses to illegal aliens?” Given the loaded nature of the question it is not at all surprising that 97% of people responded with “less likely.”

Rewrite the question so that the response bias does not exist.
Besides the response bias, what other bias was the poll subject to?

Outcomes

11C.D.1.4

Describe and compare sampling techniques; collect one-variable data from primary sources, using appropriate sampling techniques in a variety of real-world situations; and organize and store the data

Sampling and Bias (Investigation)

How Literary Digest got it all wrong

Discussion

Discussion

Discussion

Discussion

Outcomes

11C.D.1.4

What is Mathspace

About Mathspace