In the Investigation in this chapter we learnt how to find the correlation coefficient and the line of best fit using spreadsheets. In this lesson you will use those processes to answer more questions.
Use technology to find the line of best fit for the data below. Write the equation with the coefficient and constant term to the nearest two decimal places.
$x$x | $24$24 | $37$37 | $19$19 | $31$31 | $32$32 | $22$22 | $14$14 | $30$30 | $23$23 | $40$40 |
---|---|---|---|---|---|---|---|---|---|---|
$y$y | $-7$−7 | $-8$−8 | $-3$−3 | $-6$−6 | $-9$−9 | $-8$−8 | $-2$−2 | $-8$−8 | $-8$−8 | $-12$−12 |
The correlation coefficient is a measure that tells us the strength of a relationship between two variables. It is denoted by the letter $r$r. The sign of $r$r also tells us the direction of the relationship.
Some key aspects of the correlation coefficient are summarised below.
The strength of correlation depends on the size of the r value, so we can ignore the positive or negative sign:
Even when two variables have a strong relationship and $r$r is close to $1$1 or $-1$−1 we cannot say that one variable causes change in the other variable.
Given the following data:
x | $1$1 | $4$4 | $7$7 | $10$10 | $13$13 | $16$16 | $19$19 |
---|---|---|---|---|---|---|---|
y | $4$4 | $4.25$4.25 | $4.55$4.55 | $4.4$4.4 | $4.45$4.45 | $4.75$4.75 | $4.2$4.2 |
Calculate the correlation coefficient and give your answer to two decimal places.
Choose the best description of this correlation.
Moderate negative
Strong positive
Weak negative
Moderate positive
Strong negative
Weak positive
Once we have determined the equation of the line of best fit, we can use it as a model to predict the likely value of the response variable based on a given value of the explanatory variable.
Research on the number of cigarettes smoked during pregnancy and the birth weights of the newborn babies was conducted.
Average number of cigarettes per day ($x$x) | $46.20$46.20 | $13.60$13.60 | $21.60$21.60 | $25.00$25.00 | $9.20$9.20 | $37.50$37.50 | $1.20$1.20 | $17.60$17.60 | $10.60$10.60 | $13.00$13.00 | $36.80$36.80 | $19.40$19.40 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Birth weight in kilograms ($y$y) | $4.00$4.00 | $5.90$5.90 | $4.90$4.90 | $4.90$4.90 | $5.70$5.70 | $4.40$4.40 | $7.10$7.10 | $5.10$5.10 | $5.40$5.40 | $5.20$5.20 | $3.90$3.90 | $5.90$5.90 |
Using technology, calculate the correlation coefficient between the average number of cigarettes per day and birth weight.
Give your answer to three decimal places.
Choose the description which best describes the statistical relationship between these two variables.
Strong negative linear relationship
Weak relationship
Strong positive linear relationship
Moderate positive linear relationship
Moderate negative linear relationship
Use your spreadsheet to form an equation for the line of best fit of $y$y on $x$x.
Give all values to two decimal places. Give the equation of the line in the form $y=mx+b$y=mx+b.
Use your equation to predict the birth weight of a newborn whose mother smoked on average $5$5 cigarettes per day.
Give your answer to two decimal places.
Choose the description which best describes the validity of the prediction in part (d).
Despite an interpolated prediction, unreliable due to a moderate to weak correlation.
Despite a strong correlation, unreliable due to extrapolation.
Very unreliable due to extrapolation and a moderate to weak correlation.
Reliable due to interpolation and a strong correlation.