Statistics can be used to answer many questions we have about the world in which we live. However, we need to make sure the collected data is accurate. In this lesson, we will learn how to design a plan for gathering data that accurately represents the question being answered.
In statistics, a population refers to every member within any particular group of interest. A survey conducted on every member of a population is called a census.
Collecting data from every member of a population is the most accurate way of gathering information, but it is not always the most practical and can be very expensive. Collecting data from a subset of the population, called a sample, can be quicker and less expensive.
When summarizing the collected data, we use different terms depending on whether the data came from a sample or from the whole population.
If the methods of collecting the data were unbiased and the sample was representative of the population, the statistic may be used to understand the population. When we apply a statistic to a population, we are making an inference.
There are three methods of statistical design that can be used: a sample survey, an observational study, or an experiment.
This design is best when the information must be provided by a person. Popular sample surveys are ones that ask about an opinion, a feeling, or a preference.
An observational study is best for determining a correlation between two variables of interest, but it cannot determine whether one factor was a cause of another factor.
This design is best for determining cause and effect relationships. In an experiment, subjects are separated into a control group and an experimental group. A treatment is applied to the experimental group but not to the control group. This helps determine whether or not the treatment applied to the experimental group was the cause of any differences from the control group.
A city councilman wants to determine whether a new skateboard park or a new ice-skating rink should be built as the new community building project. The new project will be located in the city park.
Identify the target population.
What design methodology would be best to find out how the community feels about the two proposed community building projects?
Explain why surveying members of the ice hockey team about their preference is not representative of the population.
Determine whether each of the following situations demonstrate a sample survey, an experiment, or an observational study.
A grocery store wants to know if their customers prefer using the self-checkouts or if they prefer using the standard checkout lanes that are staffed.
A group of students wants to know how different levels of fertilizer affect plant growth.
Endangered, wild wolves were reintroduced to Yellowstone National Park. The conservationists want to know if, and by how much, the population of wolves is growing.
A researcher captures 400 fish in a lake, tags them, then releases them. The following day, he captures 1200 fish, of which 100 have tags attached to them.
Describe a statistic that could be calculated from the given information.
What inference can be made about the population based on the statistic from part (a)?
A parameter is a number that summarizes data from a population. A statistic is a number that summarizes data from a sample. A survey conducted on every member of a population is called a census.
If the sample was representative of the population, the statistic may be used to understand the population. When a statistic is applied to a population, we are making an inference.
There are three methods of statistical design:
Sample survey - used to gather information from a group of individuals
Observational study - used to determine a correlation between two variables without outside intervention
Experiment - used to determine cause and effect relationships
To avoid bias when gathering sample data, it is important that the method in which the data is collected is random.
Randomness is one way to ensure that the sample is representative of the population. A few different methods of creating a sample are described below.
Sampling method | Description |
---|---|
Systematic sample | Objects are chosen based on a consistent rule |
Stratified sample | Objects are separated into groups based on a characteristic. Objects are then randomly selected from each group |
Cluster sample | Objects in the population are randomly separated into groups. One or some of the groups are randomly selected, then objects within the selected groups are randomly chosen. |
A city mayor needs to decide which intersections in the city need stop signs, which ones need stoplights, and which ones should be converted to roundabouts. She decides to give a survey to determine how people feel about the current road conditions. She intends to prioritize fixing the intersections that cause the most frustration for drivers.
Decide whether or not the results from the following surveys are an accurate representation of the population.
If they are not an accurate representation, explain why.
The mayor asks the principal of the local high school to give the survey to 100 random students who drive themselves to school.
The mayor gives the survey to everyone in her neighborhood.
The mayor sets up a booth at a festival and asks anyone who drives and is interested to fill out a survey.
The mayor randomly selects 100 members from the voter registration and conducts phone surveys.
The mayor asks the staff members at the DMV (Department of motor vehicles) to ask everyone who comes in that day if they'd be willing to participate in the survey.
If a sample is not representative of the entire population, we cannot use the survey to draw conclusions or make inferences about the population. Instead, we say that the survey has bias. There are a number of potential sources of bias that we should avoid:
Poor sampling techniques
If the people being surveyed do not resemble the population, the survey is likely to be biased.
Convenience samples, where samples are chosen because they are easily available, introduce bias. These groups are likely to have particular traits in common that are not representative of the population as a whole.
Self-selected samples, where people volunteer their input, introduces bias. People who choose to self-select often have strong opinions that might not be representative of the population as a whole.
Too small of a sample
In general, the bigger the number of people being surveyed, the closer the results will be to a census. This is known as the Law of large numbers.
Poor question wording
If the question asked does not answer the purpose of the study, it cannot be used to interpret the variable of interest.
Using loaded or leading questions
Avoid questions which use words that suggest preference, invoke emotion, or might otherwise influence the results of the survey.
The nutrition team at a school wants to know to what extent students are making healthy lunch choices. The school cafeteria offers a salad bar, a hot lunch option and also has vending machines available. For each of the following proposed sampling method designs, describe the sampling method and decide if it is biased.
Observe the lunch choices of 3 randomly selected students from each of 15 randomly selected lunch tables in the cafeteria.
Survey 20 random students in the hallway between periods after lunch.
Observe the lunch choices of 50 randomly selected students from each grade level.
Observe the lunch choices of every 5th student that enters the cafeteria on a particular day.
Students in a certain state must take 4 years of math in high school. The school is deciding whether or not to add a statistics course as an additional option in their Senior year.
Mario surveyed 20 students from his junior year advanced pre-calculus class to find out whether juniors at his school think the school should offer a statistics class. 30\% of students said yes, and 70\% of students said no. The school has 750 juniors.
State if 70\% is a statistic or parameter. Explain how you know.
State the sampling method that Mario used to gather data. Explain your reasoning.
Write an invalid conclusion based on Mario's survey results. Explain why the claim is invalid.
Describe a plan that Mario can use to decide if there is enough interest in a senior-level statistics course next year. The plan should include the statistical question being asked, the design of the study, the target population, the sample size, and the sampling method.
When choosing a sample for a survey, observational study, or experiment, it is important that the sample was chosen randomly so that it is representative of the population. The following are various random sampling methods:
Simple random sample
Systematic sample
Stratified sample
Cluster sample
The following are potential sources of bias and should be avoided when conducting a survey, observational study, or experiment:
Poor sampling methods
Too small of a sample
Poor question wording
Using loaded or leading questions