topic badge

11.02 Scatter plots

Introduction

We have learned how to identify  independent and dependent variables  . We will now construct and interpret statistical data using scatter plots to determine whether a relationship exists between two variables.

Association and scatter plots

The first step in determining the presence and type of relationship is to plot the data on a scatter plot. A scatter plot is a statistical display that is often used to determine whether the expected relationship exists between two quantitative variables.

Once we have a scatter plot, we can start to perform analysis such as determining association.

One way to analyze scatterplots is to describe the shape that the data takes. Sometimes the data clusters around some kind of curve, so the relationship is:

A scatter plot with the y-axis ranging from 0 to 1.2 and the x-axis ranging from 0 to 35. The graph shows points roughly sloping upward from the bottom left to the upper right of the graph.
linear (a straight line)
A scatter plot with the y-axis ranging from 0 to 1.2 and the x-axis ranging from 0 to 35. The graph shows points roughly following an inverted U-shaped pattern from the bottom left to the bottom right of the graph.
non-linear (not a straight line)

An association is a way of expressing a relationship between two variables and, more specifically, how strongly pairs of data are related. We describe the association from data using language like positive association, negative association, or no association. We can even further strengthen the language by using the words strong or weak to describe the association.

Linear patterns reveal whether or not two measurements are connected to each other. In other words, the presence of a linear pattern signals that the two sets of have linear association. One way of understanding these relationships is by plotting ordered pairs onto a scatter plot. This makes it easier to recognize patterns in the data, especially whether or not these patterns appear to be linear.

This linear relationship can be seen through close and consistent grouping in a scatter plot. The more closely the dots resemble a straight line, the stronger the association between the variables.

A positive association is when the data appears to gather in a positive relationship, similar to a straight line with a positive slope. In other words, as one variable increases, the other variables also increases or as one variable decreases the other decreases as well. So basically, the variables change in the same direction.

There are three types of positive association:

5
10
15
20
x
5
10
15
20
y
  • Perfect positive association, where the data points line up exactly on a straight line with a positive slope.
5
10
15
20
x
5
10
15
20
y
  • Strong positive association, where the data points are closely clustered and resemble a straight line with a positive slope.
5
10
15
20
x
5
10
15
20
y
  • Weak positive association, where the relationship is still positive but does not resemble a line much at all.
150
165
180
195
\text{Arm Span (cm)}
150
165
180
195
\text{Height (cm)}

For example, the scatter plot shows a strong positive association between a person's height and arm span. You can see that as the first variable increases, the second increases too.

A negative association is when the data appears to gather in a negative relationship. Similar to a straight line with a negative slope. In other words, as one variable increases, the other one decreases.

Like positive association, there are three types of negative association:

5
10
15
20
x
5
10
15
20
y
  • Perfect negative association, where the data points line up exactly on a straight line with a negative slope.
5
10
15
20
x
5
10
15
20
y
  • Strong negative association, where the data points are closely clustered and resemble a straight line with a negative slope.
5
10
15
20
x
5
10
15
20
y
  • Weak negative association, where the relationship is still negative but does not resemble a line much at all.
  • Perfect negative association, where the data points line up exactly on a straight line with a negative slope.
  • Strong negative association, where the data points are closely clustered and resemble a straight line with a negative slope.
  • Weak negative association, where the relationship is still negative but does not resemble a line much at all.
20
25
30
35
40
45
50
55
60
65
70
\text{Age}
20
25
30
35
40
45
\text{Accidents}

The scatter plot shows a strong negative association. You can see that as the first variable increases, the second variable decreases.

If the points of the scatter plot are spread randomly then we can say there is no correlation.

5
10
15
20
x
5
10
15
20
y

Examples

Example 1

Identify the type of association in the following scatter plot.

1
2
3
4
5
6
7
8
9
10
11
x
1
2
3
4
5
6
7
8
9
y
Worked Solution
Create a strategy

Consider whether the points lie approximately in a line with a positive or negative slope.

Apply the idea

As the x-values increase, the y-values increase. The points resemble a straight line with a positive slope so this is a strong positive association.

Example 2

The following table shows the number of traffic accidents associated with a sample of drivers of different age groups.

AgeAccidents
2041
2544
3039
3534
4030
4525
5022
5518
6019
6517
a

Construct a scatter plot to represent the above data.

Worked Solution
Create a strategy

Draw the scatter plot by plotting each point from the table.

Apply the idea
20
25
30
35
40
45
50
55
60
65
70
\text{Age}
20
25
30
35
40
45
\text{Accidents}

Age is the independent variable, so should be put on the horizontal axis. So Accidents should be put on the vertical axis.

So the first row from the table corresponds to the point (20,41) on the graph.

b

Is the association between a person's age and the number of accidents they are involved in positive or negative?

Worked Solution
Create a strategy

Check the trend of the data on the scatter plot.

Apply the idea

Based on the scatter plot, as one variable increases, the other one decreases. So the association between a person's age and the number of accidents is negative.

c

Is the association between a person's age and the number of accidents they are involved in strong or weak?

Worked Solution
Create a strategy

Check how closely clustered the data points are.

Apply the idea

Since the points on a scatter plot tend to follow a single line, the association is strong.

d

Which age group's data represent an outlier?

A
30-years-olds
B
None of them
C
65-years-olds
D
20-years-olds
Worked Solution
Create a strategy

Check on the scatter plot if any points are positioned away from the rest of the data.

Apply the idea

Based on the scatter plot, there is no outlying point. So, the correct answer is B.

Idea summary

A positive association is when the data appears to gather in a positive direction, similar to a straight line with a positive slope. The variables change in the same direction.

A negative association is when the data appears to gather in a negative direction. Similar to a straight line with a negative slope. In other words, as one variable increases, the other one decreases.

When there is no relationship between the variables we say they have no association.

Outcomes

8.SP.A.1

Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns such as clustering, outliers, positive or negative association, linear association, and nonlinear association.

What is Mathspace

About Mathspace