Lines of fit appeared in 8th grade informally, where we determined linear associations between two quantitative variables. This lesson will use lines of fit as a foundation for discussing correlation and causation.
When looking at bivariate data, it can often appear that the two variables are correlated.
It is important to be able to distinguish between causal relationships (when changes in one variable cause changes in the other variable) and non-causal relationships
To claim a correlation between two variables, we can exmaine mathematical calculations that can measure the strength of an association between two variables. Causation can only be determined from an appropriately designed statistical experiment.
For categorical data, we can describe an association as positive or negative, as well as whether the association is strong or weak (or if there is no association).
Consider the graph shown:
A line of best fit (or trend line) is a straight line that best represents the data on a scatter plot. We can use lines of best fit to help us make predictions or conclusions about the data.
To draw a line of best fit by eye, balance the number of points above the line with the number of points below the line. We should generally ignore outliers as they can skew the line of best fit.
The analysis of bivariate data should include:
A scatter plot can be used to display bivariate data once the independent and dependent variables are defined.
The correlation coefficient, r, is a statistic that can describe both the strength and direction of a linear association.
It is important to be able to distinguish between causal relationships (when changes in one variable cause changes in the other variable) and correlation where the two variables are related, but one variable does not necessarily influence the other.
A study was conducted to find the relationship between the age at which a child first speaks and their level of intelligence as teenagers. The following table shows the ages of some teenagers when they first spoke and their results in an aptitude test:
Age when first spoke (months) | 14 | 27 | 9 | 16 | 21 | 17 | 10 | 7 | 19 | 24 |
---|---|---|---|---|---|---|---|---|---|---|
Aptitude test results | 96 | 69 | 93 | 101 | 87 | 92 | 99 | 104 | 93 | 97 |
Create a scatter plot to model the data.
Sketch an approximate line of best fit for the scatter plot and interpret the y-intercept.
Estimate the correlation coefficient and describe the association between the variables.
Determine if there is enough evidence to suggest a causal relationship between the age when a child first speaks and their intelligence as teenagers.
Determine whether the following statement is true or false:
"There is a causal relationship between number of cigarettes a person smokes and their life expectancy."
Consider the graph showing the relationship between the years since purchasing a car and its value in thousands of dollars:
The equation of the line of best fit for the line is y=-2.2x+30.5. Interpret the slope and y-intercept of the line.
Make a prediction about the value of a car after 10 years.
Estimate the correlation coefficient and describe the association between the variables and explain whether there is a causal relationship.
The analysis of bivariate data should include:
Recall that the correlation coefficient can describe both the strength and direction of a linear association.