Data is a crucial aspect of our day to day lives. It helps us to understand the world around us and make informed decisions. The process of working with data involves a series of steps commonly referred to as the data cycle. This cycle includes formulating questions, collecting or acquiring data, organizing and representing data, analyzing data, and communicating results.
To help us formulate appropriate questions, it is helpful to know what type of data we want to explore.
\text{Person} | \text{Age (years)} | \text{Systolic } \\ \text{blood pressure}\\ (\text{mmHg}) |
---|---|---|
\text{Art} | 30 | 121 |
\text{Kumi} | 40 | 140 |
\text{Isla} | 50 | 134 |
\text{Daria} | 60 | 154 |
\text{Xia} | 70 | 146 |
To start working with bivariate data, we need to formulate a statistical question.
Statistical question | Not statistical question |
---|---|
Is there a relationship between age and systolic blood pressure? | What is your blood pressure? |
Do test scores increase as the amount of time studying increases? | How long did you study and what was your grade? |
Does how long a pen lasts impact the cost of the pen? | Are there pens under five dollars that will last all year? |
A statistical question is different from a survey question that is asked to those in the people in a study.
We need to make sure that questions are not leading people to answer a particular way. This means not using emotive language or suggesting a particular answer.
Good survey question | Leading question |
---|---|
Do you watch soccer? | Do you watch the most popular sport in the world, soccer? |
How would you rate your meal? | What did you think of the meal from the outstanding chef? |
What was your average speed driving here? | Did you do the wrong thing and go over the speed limit to get here? |
Notice how the good questions are very neutral and the leading questions may encourage people to respond in a particular way.
Select the question(s) which could be answered by collecting bivariate data. Select all that apply.
The local ice cream shop has noticed that their sales seem to be related to the temperature outside. They want to investigate this relationship more closely.
Write a statistical question related to this scenario.
In a study conducted at a high school, students' study times (in hours) and their corresponding test scores (out of 100) were recorded for one particular examination. The data is to be analyzed to understand the relationship between study time and test scores.
Identify the variables involved in this scenario.
Rewrite this question so it is not leading and it could be used to accurately collect data.
"We believe students who study more do better on tests. How much time did you spend studying last night? Did you do well on the test?"
Data Cycle: A process for working with data, which includes formulating questions, collecting data, organizing and representing data, analyzing data, and communicating results.
Bivariate Data: A type of data that involves two variables, allowing us to explore potential relationships between them.
Statistical question: A clear, concise question that can be answered by collecting and analyzing data.
Variables: Quantities or qualities that can be measured or classified, which help explain a given situation or answer statistical questions.
Examples of bivariate data analysis include studying the relationship between a person's study time and their test scores, or a city's population density and its air quality.
Once we have formulated our statistical question and identified our variables, we are now ready to collect the necessary data.
There are several main methods for data collection:
Observation: Watching and noting things as they happen
Measurement: Using tools to find out how much, how long, or how heavy something is
Survey: Asking people questions to get information
Experiment: Doing tests in a controlled way to get data
Acquire existing data: Using a secondary source, usually an online database, to get raw or summarized data.
This is where sampling may come into play. Sampling methods are techniques used to select a representative subset of the population, known as a sample. A population refers to every member of a group. A sample is a subset (or a smaller group) of the population. The type of sampling method we choose can greatly influence the quality of our data and, therefore, the conclusions we draw from it.
Collecting data from every member of a population is the most accurate way of gathering information, but it is not always the most practical and can be very expensive or time consuming. Typically, a sample survey is instead done on a sample from the population to make it quicker and less expensive.
There are several types of sampling methods, each with its own strengths and weaknesses. The goal is to have a sample that is representative, so has the same characteristics as the population.
While simple random sampling can be one of the cheapest and least time-consuming methods, it may not be representative every sub-group within larger populations.
If a sample is not representative, we may say it is biased.
One particular type of biased or non-random sample is a convenience sample.
There are other sampling techniques that can lead to more representative samples. These include:
Systematic Sampling: A sampling technique where a starting point is chosen at random, and then items are chosen at regular intervals. This method is often used by manufacturers for sampling products on a production line. For example, we may call every tenth business in the phone book or select every fifth bottle from a production line.
Stratified Sampling: A sampling method that involves dividing the population into subgroups, or strata, and then selecting a separate random sample from each stratum. If one subgroup is larger than another, then we should proportionally select more people from that strata. For example, dividing a group into children, adults, and seniors and then selecting a proportional number of people from each group.
Cluster sampling: A sampling method where the population is divided into groups, or clusters. Then, a random sample of clusters is selected, and all members within selected clusters are included in the sample. It is like taking a sample of small samples.
It's important to select an appropriate method to collect a representative sample. This will help to accurately analyze the relationship between our two variables. For example, if we wanted to investigate the relationship between exercise frequency and overall health across different age groups, we might choose a stratified sample to ensure we collect data from all age groups.
Determine whether each situation demonstrates a sample survey, an experiment, or an observational study.
A hospital wants to compare the recovery rates of patients using a new dosage versus those using a standard treatment.
A new book has been released. A library wants to know if their patrons prefer e-books or physical books to determine how many of each to get.
A gym wants to know if there is a relationship between the number of visits someone had in the their first month and the length of time they will be at the gym.
A social worker notices that many of the children he helps say they want to go to the playground during their sessions. He is curious if there is a relationship between age of children and time spent at playgrounds each week in his town.
Identify the target population.
Which method would be best to collect the relevant data?
Explain why doing a sample of the children at a park one Monday morning would not be a good sample.
Dr. Jane is a health researcher and she formulated the question "Is there a relationship between the frequency of exercise and overall health among working adults in Washington, DC?" She wants to collect data for her research.
Choose an appropriate sampling method.
After we formulate a clear statistical question, we use the data cycle to collect, show, and explain information. To get data, we can use methods like:
Watching (Observation)
Measuring
Asking questions (Survey)
Doing experiments
Acquiring existing secondary data
Sampling methods are techniques to collect data from a representative subset of the population, known as a sample.
Population: every member of a group.
Sample: a subset of the population.
Types of sampling methods include:
Simple Random Sampling: every member of the population has an equal chance of being selected.
Systematic Sampling: involves selecting every nth member of the population.
Stratified Sampling: dividing the population into subgroups, and then selecting a separate random sample from each subgroup.
Cluster Sampling: the population is divided into groups, or clusters. Then, a random sample of clusters is selected, and all members within selected clusters are included in the sample.
The type of sampling method chosen can greatly influence the quality of data collected and the conclusions drawn from it.