topic badge
AustraliaVIC
VCE 12 General 2023

2.04 Scatterplots

Lesson

Introduction

In order to analyse the association between two numerical variables, we can first plot the data in a scatterplot. The explanatory variable is shown on the horizontal axis and the response variable is shown on the vertical axis. In this way, each data point is displayed as a point in a two-dimensional coordinate system. Introduction to the lesson

To help us identify any correlation between the two variables, there are three things we focus on when looking at a scatterplot:

  • Direction

  • Form

  • Strength

Scatterplots

The direction of the scatterplot refers to the pattern shown by the data points. We can describe the direction of the pattern as having positive correlation, negative correlation, or no correlation:

  • Positive correlation

    • A positive correlation occurs when the RV increases as the EV increases.

    • From a graphical perspective, this occurs when the y-coordinate increases as the x-coordinate increases, which is similar to a line with a positive gradient.

  • Negative correlation

    • A negative correlation occurs when the RV decreases as the EV increases.

    • From a graphical perspective, this occurs when the y-coordinate decreases as the x-coordinate increases, which is similar to a line with a negative gradient.

  • No correlation

    • No correlation describes a data set that has no relationship between the variables.

    • This can come in the form of totally unrelated data, or data that indicates no change of RV as the EV changes (like a horizontal straight line, which has zero gradient).

When we are looking at the form of a scatterplot we are looking to see if the data points show a pattern that has a linear form. If the data points lie on or close to a straight line, we can say the scatterplot has a linear form.

Forms other than a line may be apparent in a scatterplot. If the data points lie on or close to a curve, it may be appropriate to infer a non-linear form between the variables. We will only be using linear models in this course.

The strength of a linear correlation relates to how closely the points reassemble a straight line.

  • If the points lie exactly on a straight line then we can say that there is a perfect correlation.

  • If the points are scattered randomly then we can say there is no correlation.

Most scatterplots will fall somewhere in between these two extremes and will display a weak, moderate or strong correlation.

A perfect positive correlation graph where the data points line up on a straight line with a positive gradient.
A perfect negative correlation graph where the data points line up on a straight line with a negative gradient.
A strong positive correlation graph where the points are close to a straight line with a positive gradient.
A strong negative correlation graph where points are close to a straight line with a negative gradient.
A weak positive correlation graph where the relationship is still positive but the points do not lie on a line
A weak negative correlation graph where the relationship is still negative but the points do not lie on a line
A no correlation graph where data points are randomly scattered in the graph.
A no correlation graph where data points are closely clustered and resemble a horizontal line.

Examples

Example 1

The following table has data results from an experiment.

X145891113151819
Y2468122430465264
a

Plot the data from the table on the graph below.

Worked Solution
Create a strategy

Consider each column in the table of values represents a point and plot all these points.

Apply the idea
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
X
5
10
15
20
25
30
35
40
45
50
55
60
65
Y
b

What is the type of correlation between the data points? Select the best answer.

A
Linear Positive
B
Linear Negative
C
Nonlinear
D
No Correlation
Worked Solution
Create a strategy

Describe what can be formed from the relationship among the data points.

Apply the idea

Based on the plotted points on part (a), the relationship among the plotted points can be described by a curved line. This means that the type of correlation between the data points is nonlinear, and the correct answer is Option C.

Idea summary

There are three things we focus on when analysing a scatterplot:

  • Form: linear or non-linear, what shape the data has

If it is linear:

  • Direction: positive or negative, whether a line drawn through the data have a positive or negative gradient

  • Strength: strong, moderate, weak - how tightly the points model a line

If there is no connection between the two variables we say there is no correlation.

Outcomes

U3.AoS1.8

two-way frequency tables, segmented bar charts, back-to-back stem plots, parallel boxplots, and scatterplots, and their application in the context of identifying and describing associations

U3.AoS1.21

construct scatterplots and use them to identify and describe associations between two numerical variables

What is Mathspace

About Mathspace