7. Categorical & Quantitative Data

Lesson

This chapter revisits ideas about correlation that were discussed in a previous chapter, Scatter plots and lines of fit. We now want to quantify the idea of correlation to a numerical value instead of a worded description. We will do this using the correlation coefficient.

Play with this applet to see how the correlation coefficient changes. Move each point. Try having one outlier and see how much that can change the correlation coefficient. Try moving the points so they are in a perfect straight line. What happens to the correlation coefficient value?

- Does association imply causation? Which of the following are association, correlation, or causation?
- Smoking and lung cancer
- Vending machines in Schools and obesity
- Taking a placebo pill (inactive/fake treatment) and weight loss

- Ask students for their own examples of association, correlation, and causation.
- What
**is**reasonable?

A correlation coefficient is a value that tells you the strength of a relationship between two variables. It is denoted by the letter $r$`r`.

A perfect positive correlation has a value of $r=1$`r`=1. That means that if we graphed the variables the $xy$`x``y`-plane, it would show a perfect, positive linear relationship. A perfect negative correlation has a value of $r=-1$`r`=−1. It's a perfect negative linear relationship. No correlation therefore has a value of $r=0$`r`=0, indicating there is no relationship between the variables.

So far, so reasonable. What if I have a correlation coefficient of $0.6$0.6? $-0.53$−0.53? What do they show?

Well consider the entire correlation extremes ranging from $-1$−1 to $1$1 as a continuum like this.

Right in the middle is $0$0, we call this no correlation.

We further divide up the line to indicate other values with descriptions like Weak, Moderate and Strong (positive or negative).

Where we place these divisions can, in some ways, be a little arbitrary. Ultimately the larger $|r|$|`r`| gets, the closer to perfect it is and the closer to $0$0, the more it reflects no correlation.

A weak correlation indicates there is some correlation but it is not considered to be very significant. Values less than $0.5$0.5 are generally considered weak.

A strong correlation indicates that the connection between the variables is quite significant. The exact value that is placed on where 'strong' begins is slightly different in different parts of the world ranging from statements that values larger than $0.7$0.7 are strong, or larger than $0.8$0.8 are strong. But ultimately it's the idea that the larger the value the stronger the relationship that really matters here!

A moderate correlation falls between weak or strong.

Remember!

Remember to always state if the correlation is positive or negative by using phrases like "weak negative", "moderate positive", or "strong positive" to describe the relationships between variables.

For this course will only calculate the correlation coefficient ($r$`r`) using technology. As you study more mathematics, you might learn how to calculate the correlation coefficient on your own.

There are lots of tools we can use to calculate the value or $r$`r`. We can use Excel, Google Sheets, a TI-calculator or many other options. This investigation on the line of best fit touches on how to calculate it using Google Sheets.

If you are using a TI-83 or TI-84 here are the instructions:

- Ensure your calculator is set to DiagnosticOn by pressing
`[2nd]`

and then`[0]`

, scrolling to`DiagnosticOn`

and pressing`[Enter]`

. - Enter all of your data by pressing
`[STAT]`

and then selecting`1:Edit`

. Remember that your independent variable should go in L1 and your dependent variable in L2. - Once your data is in, press
`[STAT]`

then select`CALC`

and`4:LinReg(ax+b)`

Identify the correlation between the temperature and the number of heaters sold.

A positive correlation

AA negative correlation

BNo correlation

CA positive correlation

AA negative correlation

BNo correlation

C

For the graph depicted, choose the correlation coefficient that best represents it.

Loading Graph...

$-1$−1

A$1$1

B$0.67$0.67

C$0$0

D$-1$−1

A$1$1

B$0.67$0.67

C$0$0

D

Sean is a hotdog vendor. He records the maximum temperature of the day and the number of hotdog sold. The results are in the table given.

Maximum Temperature ($^\circ$°C) | $30$30 | $34$34 | $33$33 | $35$35 | $33$33 | $28$28 | $27$27 | $31$31 | $37$37 | $29$29 |
---|---|---|---|---|---|---|---|---|---|---|

Number of hotdogs | $18$18 | $38$38 | $26$26 | $40$40 | $24$24 | $8$8 | $20$20 | $35$35 | $43$43 | $38$38 |

Plot the information on a scatter plot.

Loading Graph...Calculate the correlation coefficient.

Give your answer to two decimal places.

Using the correlation coefficient you calculated in part (b) and the graph you created in part (a), which of the following statements is correct:

There is no evidence of a linear relationship between sales and temperature

AAs the temperature increases the sales tend to decrease

BAs the temperature increases the sales increase

CAs the temperature increases the sales tend to increase.

DAs the temperature increases the sales decrease

EThere is no evidence of a linear relationship between sales and temperature

AAs the temperature increases the sales tend to decrease

BAs the temperature increases the sales increase

CAs the temperature increases the sales tend to increase.

DAs the temperature increases the sales decrease

E

When a change in the value of one variable quantity seems to be associated with a proportional change in another variable, we say there is a correlation (or a relationship) between the two variables.

A correlation between variables may be discovered in the course of an experiment or through an analysis of observational data.

In a typical experiment, a researcher sets one variable, called the **independent **or **explanatory **variable, to various levels and observes the corresponding values of the other variable, called the **dependent** or **response*** *variable.

In the case of an observational study, more so than in an experiment, care must be taken not to assume that correlation implies causation.

Remember!

Association or correlation does not imply causation.

In an experiment, it is usually reasonable to think that if values of the independent variable are deliberately chosen and the dependent variable is observed to change accordingly, then there is a causal relation between the independent and dependent variables. However, in an observational study, the values of both variables in the pair are merely observed, not chosen.

**Contributing variable**: When two variables have an association, they may be connected through a third variable. For example, it was found that there was a strong, positive correlation between ice cream sales and the number of drownings. Does this mean that ice cream causes drowning? Absolutely not, there is a third variable, temperature, which would likely increase both ice cream sales and trips to the beach, hence drownings.

**Coincidence**: It is possible in an observational study for variables to be correlated purely by chance, such as the example below.

Thus, care is needed lest a correlation is wrongly taken to imply a causal relationship. To move from the discovery of a correlation to the claim that a causal effect has been found, researchers need to gather evidence external to the data and control all variables possible.

The table shows the number of fans sold at a store during days of various temperatures.

Temperature ($^\circ$°C) | $6$6 | $8$8 | $10$10 | $12$12 | $14$14 | $16$16 | $18$18 | $20$20 |

Number of fans sold | $12$12 | $13$13 | $14$14 | $17$17 | $18$18 | $19$19 | $21$21 | $23$23 |

Consider the correlation coefficient $r$

`r`for temperature and number of fans sold. In what range will $r$`r`be?$r=0$

`r`=0A$r>0$

`r`>0B$r<0$

`r`<0C$r=0$

`r`=0A$r>0$

`r`>0B$r<0$

`r`<0CIs there a causal relationship?

Yes

ANo

BYes

ANo

B

A study found a strong correlation between the approximate number of pirates out at sea and the average world temperature.

Does this mean that the number of pirates out at sea has an impact on world temperature?

Yes

ANo

BYes

ANo

BWhich of the following is the most likely explanation for the strong correlation?

Contributing variables - there are other causal relationships and variables that come in to play and these may lead to an indirect positive association between the approximate number of pirates out at sea and the average world temperature.

ACoincidence - there are no other contributing factors or reasonable arguments to be made for the strong positive association between the approximate number of pirates out at sea and the average world temperature.

BContributing variables - there are other causal relationships and variables that come in to play and these may lead to an indirect positive association between the approximate number of pirates out at sea and the average world temperature.

ACoincidence - there are no other contributing factors or reasonable arguments to be made for the strong positive association between the approximate number of pirates out at sea and the average world temperature.

BWhich of the following is demonstrated by the strong correlation between the approximate number of pirates out at sea and the average world temperature?

If there is correlation between two variables, then there must be causation.

AIf there is correlation between two variables, there isn't necessarily causation.

BIf there is correlation between two variables, then there is no causation.

CIf there is correlation between two variables, then there must be causation.

AIf there is correlation between two variables, there isn't necessarily causation.

BIf there is correlation between two variables, then there is no causation.

C

Summarize, represent, and interpret data on two categorical and quantitative variables.

Analyze linear models to make interpretations based on the data.