This chapter revisits ideas about correlation that were discussed in a previous chapter, Scatter plots and lines of fit. We now want to quantify the idea of correlation to a numerical value instead of a worded description. We will do this using the correlation coefficient.
Play with this applet to see how the correlation coefficient changes. Move each point. Try having one outlier and see how much that can change the correlation coefficient. Try moving the points so they are in a perfect straight line. What happens to the correlation coefficient value?
A correlation coefficient is a value that tells you the strength of a relationship between two variables. It is denoted by the letter $r$r.
A perfect positive correlation has a value of $r=1$r=1. That means that if we graphed the variables the $xy$xy-plane, it would show a perfect, positive linear relationship. A perfect negative correlation has a value of $r=-1$r=−1. It's a perfect negative linear relationship. No correlation therefore has a value of $r=0$r=0, indicating there is no relationship between the variables.
So far, so reasonable. What if I have a correlation coefficient of $0.6$0.6? $-0.53$−0.53? What do they show?
Well consider the entire correlation extremes ranging from $-1$−1 to $1$1 as a continuum like this.
Right in the middle is $0$0, we call this no correlation.
We further divide up the line to indicate other values with descriptions like Weak, Moderate and Strong (positive or negative).
Where we place these divisions can, in some ways, be a little arbitrary. Ultimately the larger $|r|$|r| gets, the closer to perfect it is and the closer to $0$0, the more it reflects no correlation.
A weak correlation indicates there is some correlation but it is not considered to be very significant. Values less than $0.5$0.5 are generally considered weak.
A strong correlation indicates that the connection between the variables is quite significant. The exact value that is placed on where 'strong' begins is slightly different in different parts of the world ranging from statements that values larger than $0.7$0.7 are strong, or larger than $0.8$0.8 are strong. But ultimately it's the idea that the larger the value the stronger the relationship that really matters here!
A moderate correlation falls between weak or strong.
Remember to always state if the correlation is positive or negative by using phrases like "weak negative", "moderate positive", or "strong positive" to describe the relationships between variables.
For this course will only calculate the correlation coefficient ($r$r) using technology. As you study more mathematics, you might learn how to calculate the correlation coefficient on your own.
There are lots of tools we can use to calculate the value or $r$r. We can use Excel, Google Sheets, a TI-calculator or many other options. This investigation on the line of best fit touches on how to calculate it using Google Sheets.
If you are using a TI-83 or TI-84 here are the instructions:
, scrolling to
[STAT]and then selecting
1:Edit. Remember that your independent variable should go in L1 and your dependent variable in L2.
Identify the correlation between the temperature and the number of heaters sold.
For the graph depicted, choose the correlation coefficient that best represents it.
Sean is a hotdog vendor. He records the maximum temperature of the day and the number of hotdog sold. The results are in the table given.
|Maximum Temperature ($^\circ$°C)||$30$30||$34$34||$33$33||$35$35||$33$33||$28$28||$27$27||$31$31||$37$37||$29$29|
|Number of hotdogs||$18$18||$38$38||$26$26||$40$40||$24$24||$8$8||$20$20||$35$35||$43$43||$38$38|
Plot the information on a scatter plot.
Calculate the correlation coefficient.
Give your answer to two decimal places.
Using the correlation coefficient you calculated in part (b) and the graph you created in part (a), which of the following statements is correct:
When a change in the value of one variable quantity seems to be associated with a proportional change in another variable, we say there is a correlation (or a relationship) between the two variables.
A correlation between variables may be discovered in the course of an experiment or through an analysis of observational data.
In a typical experiment, a researcher sets one variable, called the independent or explanatory variable, to various levels and observes the corresponding values of the other variable, called the dependent or response variable.
In the case of an observational study, more so than in an experiment, care must be taken not to assume that correlation implies causation.
Association or correlation does not imply causation.
In an experiment, it is usually reasonable to think that if values of the independent variable are deliberately chosen and the dependent variable is observed to change accordingly, then there is a causal relation between the independent and dependent variables. However, in an observational study, the values of both variables in the pair are merely observed, not chosen.
Contributing variable: When two variables have an association, they may be connected through a third variable. For example, it was found that there was a strong, positive correlation between ice cream sales and the number of drownings. Does this mean that ice cream causes drowning? Absolutely not, there is a third variable, temperature, which would likely increase both ice cream sales and trips to the beach, hence drownings.
Coincidence: It is possible in an observational study for variables to be correlated purely by chance, such as the example below.
Thus, care is needed lest a correlation is wrongly taken to imply a causal relationship. To move from the discovery of a correlation to the claim that a causal effect has been found, researchers need to gather evidence external to the data and control all variables possible.
The table shows the number of fans sold at a store during days of various temperatures.
|Number of fans sold||$12$12||$13$13||$14$14||$17$17||$18$18||$19$19||$21$21||$23$23|
Consider the correlation coefficient $r$r for temperature and number of fans sold. In what range will $r$r be?
Is there a causal relationship?
A study found a strong correlation between the approximate number of pirates out at sea and the average world temperature.
Does this mean that the number of pirates out at sea has an impact on world temperature?
Which of the following is the most likely explanation for the strong correlation?
Which of the following is demonstrated by the strong correlation between the approximate number of pirates out at sea and the average world temperature?
Distinguish between correlation and causation.