topic badge

5.08 Scatter plots

Lesson

Concept summary

In statistics, bivariate data is data on two variables, where each value of one of the variables is paired with a value of the other variable.

We can analyze bivariate data by looking for an association between the two variables.

Association

A way to describe the form, direction or strength of the relationship between the two variables in a bivariate data set.

The analysis of bivariate data should include:

  • Form, usually described as a linear association or nonlinear association
  • Strength, describing how closely the data points match the form
  • Direction, usually described as positive association or negative association

A scatterplot can be used to display bivariate data once the independent and dependent variables are defined.

The correlation coefficient, r, is a statistic that can describe both the strength and direction of a linear association.

10
20
30
40
50
60
70
80
90
x
1
2
3
4
5
6
7
8
9
y
Perfect positive correlation r=1
5
10
15
20
25
30
35
40
45
x
5
10
15
20
25
30
35
40
45
y
Perfect negative correlation r=-1
10
20
30
40
50
60
70
80
90
x
1
2
3
4
y
Strong negative correlation r=-0.974
0.2
0.3
x
254
255
256
257
258
259
260
y
Weak positive correlation r=0.306
1
2
3
4
5
6
7
8
9
x
1
2
3
4
5
6
7
8
9
y
Moderate negative correlation r=-0.684
1
2
3
4
5
6
7
8
9
x
1
2
3
4
5
6
7
8
9
y
No correlation r=0.072

It is important to be able to distinguish between causal relationships (when changes in one variable cause changes in the other variable) and correlation where the two variables are related, but one variable does not necessarily influence the other.

Causation

A relationship between two events where one event causes the other.

Worked examples

Example 1

Determine whether the following statement is true or false:

"There is a causal relationship between number of cigarettes a person smokes and their life expectancy"

Approach

It is generally understood that smoking cigarettes can cause disease such as cancer, which is know to have an effect on life expectancy.

Solution

True

Reflection

The evidence of a causal relationship usually comes from generally accepted truths or verified research studies. A causal relationship is not confirmed by finding an association between variables.

Example 2

A study was conducted to find the relationship between the age at which a child first speaks and their level of intelligence as teenagers. The following table shows the ages of some teenagers when they first spoke, and their results in an aptitude test:

Age when first spoke (months)142791621171071924
Aptitude test results9669901018792991049397
a

Create a scatterplot to model the data.

Approach

Let x=\text{Age when the child first spoke} and y=\text{Aptitude test results as a teen}.

The minimum value for x is 7 and the maximum is 27 so we can use a scale of 5 to label the x-axis. The minimum value for y is 69 and the maximum is 104 so we can use a scale of 20 to label the x-axis.

Solution

5
10
15
20
25
\text{age (months)}
20
40
60
80
100
\text{aptitude score}
b

Estimate the correlation coefficient and describe the association between the variables.

Approach

The correlation will be a value between -1 and 1 depending on the strength and direction.

Describe the association with the following attributes:

  • Form: linear or nonlinear
  • Strength: strong or weak
  • Direction: positive or negative

Solution

The association between the age when a child first spoke and their aptitude test score as a teenager has a strong, negative, linear association. The correlation coefficent is between -0.9 and -0.7.

Reflection

The closer the points are to forming a curve, the stronger their association will be. Extreme data values have a large impact on correlation.

c

Determine if there is enough evidence to suggest a causal relationship between the age when a child first speaks and their intelligence as teenagers.

Solution

No, correlation is not causation.

Reflection

An association between two quantities is evidence to suggest that the value of one quantity can be predicted with some accuracy given the other quantity, but is not enough evidence to suggest that changes in one quantity directly cause changes in the other.

Outcomes

M1.N.Q.A.1

Use units as a way to understand real-world problems.*

M1.N.Q.A.1.A

Choose and interpret the scale and the origin in graphs and data displays.*

M1.N.Q.A.1.C

Define and justify appropriate quantities within a context for the purpose of modeling.*

M1.S.ID.A.1

Represent data from two quantitative variables on a scatter plot, and describe how the variables are related. Fit a function to the data; use functions fitted to data to solve problems in the context of the data.*

M1.S.ID.B.4

Explain the differences between correlation and causation. Recognize situations where an additional factor may be affecting correlated data.*

M1.MP2

Reason abstractly and quantitatively.

M1.MP3

Construct viable arguments and critique the reasoning of others.

M1.MP4

Model with mathematics.

M1.MP5

Use appropriate tools strategically.

M1.MP6

Attend to precision.

What is Mathspace

About Mathspace