5. Collecting Data

5.015 Sampling methods

5.02 Errors in surveys

Investigation: All about a census

5.03 Bias in questioning

Investigation: All about a survey

Lesson

5.04 Misrepresentation of results

Investigation: Statistics in the media

Book a Demo

Australia AU

Year 12

Investigation: All about a survey

Lesson

Bias from samples and surveys

The Literary Digest poll

In 1936 the US magazine Literary Digest ran a poll to predict who would win the US presidential election out of Alf Landon and Franklin Roosevelt. It sent out 10 million questionnaires and got 2266566 responses back. The results of the poll indicated a victory for Landon with57% of the vote. But come the actual election, Roosevelt won in a landslide with 63% of the vote, with Landon receiving only 37%. How could Literary Digest get it wrong and get it wrong by such a large amount?

Meanwhile, George Gallup conducted a much smaller poll, comprised of only 50000 people, and correctly predicted Roosevelt’s victory. Not only that, he also correctly predicted the incorrect result of the Literary Digest poll using a random sample smaller than theirs but chosen to match the demographics.

How Literary Digest got it all wrong

The moral here is that the sampling method is much more important than the sample size. There were two problems with the way the Literary Digest constructed their sample. First, they didn’t use a random sample and so the sample was not representative of the population. Instead, they selected their sample from car registrations, telephone directories, country club memberships and magazine subscriptions; and at that time right after the Great Depression, those people who had cars, phones, magazines, etc were more likely to be wealthy and Republicans.

But the selection bias produced by this particular sample was not as severe as the nonresponse bias. From the 10 million questionnaires sent out, they only got 2266566 back – a response rate of just under 23%. Those who had strong opinions, particularly those who were dissatisfied with Roosevelt’s performance as president and wanted change, were more likely to respond, while those who were satisfied were less inclined to complete the questionnaires.

These are just two of many biases that sampling methods can be subject to. A similar but related bias is the voluntary response bias that arises when it is the respondents who decide whether to join the sample. A good example of this is when TV and radio stations try to gauge public opinion by asking their viewers or listeners to call in or to participate in their online poll. But worse still are the reality TV shows like Australian Idol that ask viewers to vote for their favourite contestant. It is important to keep in mind that the winner is not necessarily the contestant that has the most number of admirers among viewers because the majority of people who vote tend to feel strongly about particular contestants and so are not representative of all the viewers. Furthermore, the problem is made worse by the fact that any person can vote an unlimited number of times.

To see the problem presented by the voluntary response bias, consider the following example. There was a US television poll that asked viewers “Do you support the President’s economic plan?” (The president at the time was Bill Clinton.) The table below shows the results of this poll and the results of a properly conducted survey by a market research company.

	Television Poll	Proper Survey
Yes	42%	75%
No	58%	18%
Not sure	0%	7%

As you can tell, there is a big difference between the two sets of numbers and this is due to the voluntary response bias in the television poll. The respondents themselves chose to be included in the sample and, as the results show, most of these respondents did not support the plan. Furthermore, there was no “Not sure” option available for respondents to choose, which only made the results of the television poll even more misleading.

Another kind of sampling that will lead to unrepresentative data is convenience sampling. With convenience sampling, those people that are easiest to recruit by the researcher are selected. A good example of this would be trying to gauge public opinion on an issue by asking a few of your friends for their thoughts since they are easy for you to get a hold of. Another example would be conducting a survey in a shopping centre.

A response bias occurs when the phrasing of questions leads people to give a response that doesn’t reflect their true beliefs. For example, a Roper poll in 1993 asked “Does it seem possible or does it seem impossible to you that the Nazi extermination of the Jews never happened?” The relatively high number of responses for “possible” shocked many people. But considering the complicated structure of the sentence, many of the people who responded may not have fully understood the question and hence may not have provided a response that reflected what they truly believed. This suspicion was tested when Roper conducted the same poll with a simpler, revised question: “Does it seem possible to you that the Nazi extermination of the Jews never happened, or do you feel certain that it happened?” The table below shows the results of the responses to both questions.

Original poll		Revised poll
Impossible	65%	Certain that it happened	91%
Possible	22%	Possible that it never happened	1%

Posed below are a number of discussions

Discussion 1

A local council wants to know people’s opinions about the mayor’s performance and so decides to hold a community forum, inviting members of the public to attend and voice their opinions.

Is a sample or census used in this study?
What is the population of interest in this situation?
Will the sample be representative of the population of interest? If not, what is the potential bias that may arise from its use?
How will obtaining a biased sample lead to an inaccurate measure of community opinion?
How would you obtain the sample if you were in charge?

Discussion 2

As part of an assignment you are required to determine the most supported NRL team among students at your school.

Give an example of a method that would be subject to a voluntary response bias.
Give an example of a method that would be susceptible to a nonresponse bias.
Give an example of a method that would be subject to a selection bias.
Give an example of a method that would be free of any form of bias.

Discussion 3

Explain why the following samples are biased:

A survey is attached to a women's magazine and asks readers about their support for idea of unpaid maternity leave. Readers are asked to return the survey by mailing it back to the headquarters of the magazine’s publishers.
A television reporter gauges the popularity of the Australian and English cricket teams by standing outside the Sydney Cricket Ground during a test match between the two countries and asking people who enter or leave the ground about which team they support.
The NSW health minister attempts to measure the percentage of doctors that bulk-bill by contacting 100 randomly selected from the Yellow Pages.
A reporter on A Current Affair approaches randomly selected people on a street in Sydney CBD to ask for their opinions on the latest decision by Fair Work Australia to increase the federal minimum wage.
A market research company contacts 100000 randomly selected people by phone to participate in a survey, but only 1000 agree to participate.

Discussion 4

Following the proposal in the US by a politician to offer driver’s licenses to illegal immigrants, there was a poll on CNN asking viewers “Would you be more or less likely to vote for a presidential candidate who supports giving drivers’ licenses to illegal aliens?” Given the loaded nature of the question it is not at all surprising that 97% of people responded with “less likely.”

Rewrite the question so that the response bias does not exist.
Besides the response bias, what other bias was the poll subject to?

Survey design

Benjamin Disraeli once said that "There are three kinds of lies: lies, damned lies, and statistics.". As this quote alludes to, one must be cautious when dealing with statistics, as they can often be accidentally or deliberately misleading. This is especially true when considering statistics which use surveys as their basis, as surveys are particularly prone to problems of poor design. Let's have a look at some of the ways a survey can be poorly designed:

Biased questions and answers

The questions or the answer choices can be biased by the type of language used. For example, imagine a survey which asked "Do you support Australia's participation in anti-terrorist operations overseas?".

If we wanted responders to answer yes to this question, we might reword it as "Do you support Australia's participation in an international alliance to prevent the spread of terrorism?", which carries a more positive connotation. Alternatively, we could ask "Do you support Australia's decision to risk the lives of our soldiers by invading foreign countries?", which carries a more negative connotation.

Whilst die-hard supporters of intervention or non-intervention will be unlikely to be swayed by such tricks, people who are "on the fence" or have no real opinion on the matter may be influenced by the way a question is asked, particularly when the only possible answers are "yes" or "no". With our results, we could then argue that Australians do/do not support participation in anti-terrorist operations overseas, and publish news articles or political statements using our "evidence".

Task 1 - Question writing.

Part 1a

Write your own survey question with a neutral basis.

Part 1b

Now change this question to have a positive bias (try to get your respondents to answer yes).

Part 1c

Now change this question to have a negative bias (try to get your respondents to answer no).

Part 1d

Test out each version of your survey question with a different group of students, and see if the wording actually does change how people answer.

Confusing or ambiguous questions

Questions may be phrased in a way that makes it difficult to understand what they are trying to ask.

A particularly common case is the "double negative", for example "Do you disagree with the position that Australia does not need high speed fibre optic internet?". Someone who is reading this quickly might not spot the double negative, and would answer it incorrectly.

Ambiguity in a question can also confuse respondents. For example, a question might ask "do you think primary school students having access to a smart phone is a good thing?". Some people may interpret this as meaning that the students would be able to use their parents' phones, whilst others may interpret this as meaning that the students would own their own phones.

Task 2 - Double Negatives and Ambiguity

Part 2a

Have a go at writing your own double negative question.

Part 2b

Now try writing an ambiguous question, and describe the two (or more) ways in which people might interpret it.

Double-barrelled questions

These are questions which ask two things at once. For example, "Do you support school students spending less time at school and doing more homework?". Some people might be happy with less time at school, but against more homework.

Task 3

Part A

Write your own double-barrelled question, and then show the proper way to do it, by separating them into two separate questions.

Privacy and ethics

As surveys are usually done on a voluntary basis, it is important not to ask questions which are too sensitive, or add a "prefer not to answer" option, so that the respondent does not throw out the survey entirely to avoid answering. Questions relating to race, religion, income, drug and alcohol use, medical conditions and sexuality are all types of questions which should be dealt with in a sensitive manner. For example, you may have noticed that government forms which ask "Are you Aboriginal or Torres Strait Islander?" always have an option for "prefer not to answer".

The reasons why people would be concerned about answering such questions vary greatly according to the type of survey and the context in which it is asked.

Task 4

Discuss amongst yourselves about some scenarios where people may find answering survey questions problematic.

A real life example

In case you're thinking that no one would be foolish enough to make mistakes like these when designing a real survey, take a look at this survey which the National Rifle Association sent to its members.

Task 5

Part A

What examples of problems with the survey by the National Rifle Association design can you find?

Part B

Why might the survey questions be asked in this way?

Part C

How could you rewrite them?

Summary

So far we've seen that statistics produced from surveys can be seriously flawed, even to the extent that they can be used to deliberately lie. So what should you do when you see a newspaper article that says, for example, "74% of Australians believe that a carbon tax is a bad idea"? Well, the ideal would, of course, be to try to find a copy of the survey itself online, which you can then check yourself for poor survey design. However, it would be very difficult to do this for every statistic you see.

A good shortcut is to check who conducted the survey. Well funded, politically-neutral organisations are usually fairly good at designing surveys which are well made and free from bias. If the above statistic came from the Australian Climate Council, one could be fairly confident in its findings. On the other hand, if the statistic was from the Australian Coal Association, one would have to be a lot more suspicious that deliberately poor survey design played a part in its findings.

However, it is always worth looking up the organisation, as some lobby groups choose deliberately misleading names - for example, the Global Climate Coalition was the name for a lobby group comprised of mainly of oil firms seeking to prevent action on global warming. Indeed, it is standard practice for front groups to use neutral sounding names like "Institute of Research" or "Centre for ... Studies" to mask their biases.

In the end, when it comes to statistics you can never be too careful!

Outcomes

ACMEM133

investigate questionnaire design principles; for example, simple language, unambiguous questions, consideration of number of choices, issues of privacy and ethics, and freedom from bias

ACMEM134

describe the faults in the collection of data process

ACMEM135

describe sources of error in surveys; for example, sampling error and measurement error

ACMEM137

investigate errors and misrepresentation in surveys, including examples of media misrepresentations of surveys

Investigation: All about a survey

Bias from samples and surveys

The Literary Digest poll

Discussion 1

Discussion 2

Discussion 3

Discussion 4

Survey design

Biased questions and answers

Task 1 - Question writing.

Part 1a

Part 1b

Part 1c

Part 1d

Confusing or ambiguous questions

Task 2 - Double Negatives and Ambiguity

Part 2a

Part 2b

Double-barrelled questions

Task 3

Part A

Privacy and ethics

Task 4

A real life example

Task 5

Part A

Part B

Part C

Summary

Outcomes

ACMEM133

ACMEM134

ACMEM135

ACMEM137

What is Mathspace

About Mathspace