8. Two Variable Data Analysis

Lesson

In order to analyze the association between two numerical variables, we can first plot the data in a scatterplot. The independent variable is shown on the horizontal axis and the dependent variable is shown on the vertical axis. In this way, each data point is displayed as a point in a two-dimensional coordinate system.

To help us identify any correlation between the two variables, there are three things we focus on when looking at a scatterplot:

- Direction
- Form
- Strength

The direction of the scatterplot refers to the pattern shown by the data points. We can describe the direction of the pattern as having positive correlation, negative correlation or no correlation:

- Positive correlation
- A positive correlation occurs when the dependent variable
**increases**as the independent variable increases. - From a graphical perspective this occurs when the $y$
`y`-coordinate**increases**as the $x$`x`-coordinate increases, which is similar to a line with a**positive**slope.

- A positive correlation occurs when the dependent variable
- Negative correlation
- A negative correlation occurs when the dependent variable
**decreases**as the independent variable increases. - From a graphical perspective this occurs when the $y$
`y`-coordinate**decreases**as the $x$`x`-coordinate increases, which is similar to a line with a**negative**slope.

- A negative correlation occurs when the dependent variable
- No correlation
- No correlation describes a data set which has no relationship between the variables.
- This can come in the form of totally unrelated data, or data that indicates no change of dependent variable as the independent variable changes (like a horizontal straight line, which has zero slope).

When we are looking at the form of a scatterplot we are looking to see if the data points show a pattern that has a linear form. If the data points lie on or close to a straight line, we can say the scatterplot has a linear form*. *

Forms other than a line may be apparent in a scatterplot. If the data points lie on or close to a curve, it may be appropriate to infer a non-linear form between the variables.

The strength of a linear correlation relates to how closely the points reassemble a straight line.

- If the points lie
**exactly**on a straight line then we can say that there is a**perfect correlation.** - If the points are scattered randomly then we can say there is
**no correlation.**

Most scatterplots will fall somewhere in between these two extremes, and will display a **weak, moderate or strong correlation. **

To measure the strength of a linear correlation we calculate something called the **correlation coefficient** (also known as the **r value**). This calculation will be discussed in the next chapter.

Identify the type of correlation in the following scatter plot.

**Think:** If we draw a straight line through the points, we will be able to look at the slope of the line and how closely it fits the points. Here is a line that approximates the trend of the data:

**Do:** The line that we drew to approximate the data has a slope of around $+1$+1, so this is a positive correlation. The line fits quite closely to all of the points, so it is a strong correlation. In summary, we would say that this scatterplot indicates a **strong, positive correlation**.

Describe the correlation between the two variables; eye colour and IQ.

**Think:** Does a person's eye colour have anything to do with their IQ?

**Do:** Eye colour and IQ is an example of a pair of variables that have **no correlation**.

The following table has data results from an experiment.

$X$X |
$2$2 | $4$4 | $7$7 | $9$9 | $12$12 | $15$15 | $17$17 | $20$20 |

$Y$Y |
$2$2 | $4$4 | $6$6 | $8$8 | $12$12 | $18$18 | $28$28 | $38$38 |

Plot the data from the table on the graph below.

Loading Graph...What is the type of correlation between the data points? Select the best answer.

Linear Positive

ALinear Negative

BNonlinear

CNo Correlation

DLinear Positive

ALinear Negative

BNonlinear

CNo Correlation

D

The following table shows the number of traffic accidents associated with a sample of drivers of different age groups.

Age |
Accidents |
---|---|

$20$20 | $41$41 |

$25$25 | $44$44 |

$30$30 | $39$39 |

$35$35 | $34$34 |

$40$40 | $30$30 |

$45$45 | $25$25 |

$50$50 | $22$22 |

$55$55 | $18$18 |

$60$60 | $19$19 |

$65$65 | $17$17 |

Which of the following scatter plots correctly represents the above data?

ABCABCIs the correlation between a person's age and the number of accidents they are involved in positive or negative?

Positive

ANegative

BPositive

ANegative

BIs the correlation between a person's age and the number of accidents they are involved in strong or weak?

Strong

AWeak

BStrong

AWeak

BWhich age group's data represent an outlier?

30-year-olds

ANone of them

B65-year-olds

C20-year-olds

D30-year-olds

ANone of them

B65-year-olds

C20-year-olds

D

Consider the table of values that show four excerpts from a database comparing the income per capita of a country and the child mortality rate of the country. If a scatter plot was created from the entire database, what relationship would you expect it to have?

Income per capita | Child Mortality rate |
---|---|

$1465$1465 | $67$67 |

$11428$11428 | $16$16 |

$2621$2621 | $35$35 |

$32468$32468 | $9$9 |

Strongly positive

ANo relationship

BStrongly negative

CStrongly positive

ANo relationship

BStrongly negative

C

Create a scatter plot to represent the relationship between two variables, determine the correlation between these variables by testing different regression models using technology, and use a model to make predictions when appropriate.