We often analyze bivariate data to determine whether a relationship between the two variables exists. A scatterplot can be used to display bivariate, numerical data once the independent and dependent variables are defined.
The analysis of bivariate data should include:
Form, usually described as a linear relationship or a nonlinear relationship
Strength, describing how closely the data points match the model line or curve
If the relationship between the variables is linear, the direction of the relationship can be described as positive or negative.
Positive relationship: as the independent variable increases, the dependent variable increases
Negative relationship: as the independent variable increases, the dependent variable decreases
The dashed lines in the scatterplots will help us visualize possible trends in the data.
When comparing bivariate data, it may be necessary to separate the data into categories. For example, when comparing the weights of dogs during their first year after birth, the data might not show a relationship because large dogs (like Boxers) will grow much more than small dogs (like Yorkies).
We can compare categorical variables in scatterplots by using different colors or symbols.
The weights of small, medium, and large dogs over time are shown in the scatterplot.
Different colored dots represent the different categories or sizes of dogs.
For each category, there is a strong, positive linear relationship between the dogs' age and weight.
It is important to note that the existence of a relationship between two variables in a scatterplot, regardless of strength, does not necessarily imply that one causes the other. Causation can only be determined from an appropriately designed statistical experiment.
For each scatterplot, determine whether the variables have a linear relationship, a nonlinear relationship, or no relationship. If there is a relationship, describe its strength.
If the relationship is linear, describe the direction as positive or negative.
Justin recently had surgery for a torn muscle in his leg. He is taking medication for the pain as well as attending regular physical therapy sessions. He learns that not everyone's insurance plan covers physical therapy.
Justin wants to investigate whether the post-surgery pain from a torn muscle lasts longer for patients who only take medication compared to those who can attend physical therapy sessions. Which question should he use for his investigation?
Describe the data that would need to be collected to answer Justin's statistical question.
A surfing company is located in various coastal states across the U.S. When analyzing their data, they separate the store locations into two regions: the Western region and the Eastern region. The scatterplot shows data collected to answer the question, "How have the sales of our product changed over time in each of the sales regions?"
Identify the independent and dependent variables in this context.
The owner of the company makes this conclusion: "The sales of the product are improving with time." Which sales region was the owner analyzing?
Adria heard that children who learn to speak at a young age are more likely to be gifted and talented in later stages of life. She decides to investigate this using the data cycle.
Formulate a statistical question for Adria that would lead to the collection of data that can be represented in a scatterplot.
The table shows the ages of some teenagers when they first spoke and their results in an aptitude test:
Age when first spoke (months) | 14 | 27 | 9 | 16 | 21 | 17 | 10 | 7 | 19 | 24 |
---|---|---|---|---|---|---|---|---|---|---|
Aptitude test results | 96 | 69 | 93 | 101 | 87 | 92 | 99 | 104 | 93 | 97 |
Create a scatterplot to model the data.
Draw a conclusion about the data by answering the statistical question from part (a).
The analysis of bivariate data should include:
Form, usually described as a linear relationship or a nonlinear relationship
Strength, describing how closely the data points match the model line or curve
If the relationship between the variables is linear, the direction of the relationship can be described as positive or negative.
Positive relationship: as the independent variable increases, the dependent variable increases
Negative relationship: as the independent variable increases, the dependent variable decreases