Regression Analysis

Lesson

The skill of being able to fit a line to data is a very important one, although, just as important is the ability to be able to interpret the information from the fitted line. This leads us to be able to make important judgments about, and analyse, the original data.

Let's consider an example:

The average number of pages read to a child each day and the child’s growing vocabulary are measured. Some data is given below.

Pages read per day | $25$25 | $27$27 | $29$29 | $3$3 | $13$13 | $31$31 | $18$18 | $29$29 | $29$29 | $5$5 |
---|---|---|---|---|---|---|---|---|---|---|

Total vocabulary | $402$402 | $440$440 | $467$467 | $76$76 | $220$220 | $487$487 | $295$295 | $457$457 | $460$460 | $10$106 |

**Think**: Before we begin any calculations and interpretations, let's consider what is going on in this question. Somebody wants to know if there is a relationship between the number of pages a child reads each day and their total vocabulary. A child's total vocabulary is an important indicator in a child's overall educational development, especially in regard to their reading.

By looking at the table, we can see that for children who read large amounts of pages each day, like the first three values, $25,27$25,27, and $29$29; their total vocabularies are quite large when compared to those students who read a low number, like $3$3 or $5$5. It seems there will be a positive relationship, but let's work through the rest of the example to find out, and also to see what other information we can gather and interpret.

a) Find the equation of the Least Squares Regression Line for the Total Vocabulary ($y$`y`) in terms of Pages Read per Day ($x$`x`).

Round all values to one decimal place.

**Think**: For this type of question, we need to use some form of technology to help us, as the manual process is very, very long. When using a graphics calculator, or other technology, we need to enter the $x$`x` values in the first list or column, and the $y$`y` values in the second list or column.

You can then fit a linear regression to the data using your graphics calculator, or other technology. This will help you find the equation of the Least Squares Regression Line.

**Do**: After following the appropriate process according to your graphics calculator, or other technology, you should end up with the equation:

$y=14.9x+30.1$`y`=14.9`x`+30.1

b) State the $y$`y`-intercept.

**Think**: When an equation is of the form:

$y=mx+b$`y`=`m``x`+`b`

$m$`m` is the gradient of the line, and $b$`b` is the $y$`y`-intercept.

What is the value of $b$`b` in this case?

**Do**: In this case, $y=14.9x+30.1$`y`=14.9`x`+30.1, and so the $y$`y`-intercept, which is the number on its own, is equal to $30.1$30.1.

c) The $y$`y`-intercept indicates that when a child doesn't read any pages each day, then their vocabulary, on average, is _____ words.

**Think**: The $y$`y`-intercept occurs when $x=0$`x`=0.

In this case, the $x$`x`-variable is total Pages Read per Day.

How can we use this information to determine how many words they have in their vocabulary when they don't read any pages?

As $x=0$`x`=0, this means Pages Read per Day$=$=$0$0, which means a child doesn't read any pages.

Therefore, the $y$`y`-intercept is the number of words in their vocabulary when the child doesn't read any pages.

**Do**: From part (b), the $y$`y`-intercept is $30.1$30.1. Therefore, the $y$`y`-intercept indicates that when a child doesn't read any pages each day, then their vocabulary, on average, is $30.1$30.1 words.

d) Does the interpretation in the previous part make sense in this context? We have the following two options to consider.

A: No, when the explanatory variable has a value of zero, this is outside the data range and the value of the dependent variable does not make sense.

B: Yes, when the explanatory variable has a value of zero, this is still within the data range and the value of the dependent variable makes sense.

**Think**: In this case, the explanatory variable is **Pages Read per Day**.

Does a value of **Pages Read per Day**$=$=$0$0 lie within the data range?

Does a value of **Pages Read per Day**$=$=$0$0 make sense?

Do: A value of **Pages Read per Day**$=$=$0$0 does lie within the data range and does make sense, because a child could read $0$0 pages per day, which means they do no reading practice at all. Therefore, the answer is B: Yes, when the explanatory variable has a value of zero, this is still within the data range and the value of the dependent variable makes sense.

e) State the gradient of the line.

**Think**: When an equation is of the form:

$y=mx+b$`y`=`m``x`+`b`

$m$`m` is the gradient of the line, and $b$`b` is the $y$`y`-intercept.

What is the value of $m$`m` in this case?

**Do**: In this case, $y=14.9x+30.1$`y`=14.9`x`+30.1, and so the gradient, which is the number next to the $x$`x`, is equal to $14.9$14.9.

f) Which of the following is true?

A: The gradient of the line indicates that the bivariate data set has a negative correlation. OR

B: The gradient of the line indicates that the bivariate data set has a positive correlation.

**Think**: The sign of the gradient, positive or negative, tells us whether the bivariate data has a positive or negative correlation.

Is this gradient positive or negative?

In part (e), we found that the gradient of the line is $14.9$14.9, which is a positive value.

Therefore, is the correlation positive or negative?

Remember:

Positive Gradient $=$= Positive Correlation

Negative Gradient $=$= Negative Correlation

Do: As we have a positive gradient, this means we have a positive correlation. Therefore, the answer is B: The gradient of the line indicates that the bivariate data set has a positive correlation.

g) Which of the following is true? (we have 4 options to consider)

A: If the explanatory variable increases by $1$1 unit, then the dependent variable increases by $30.1$30.1 units.

B: If the explanatory variable increases by $1$1 unit, then the dependent variable increases by $14.9$14.9 units.

C: If the explanatory variable increases by $1$1 unit, then the dependent variable decreases by $14.9$14.9 units.

D: If the explanatory variable increases by $1$1 unit, then the dependent variable decreases by $30.1$30.1 units.

**Think**: The gradient tells us how many units the dependent variable increases for every $1$1 unit of the explanatory variable.

Is the gradient increasing or decreasing?

Remember:

Positive gradient means it is increasing.

Negative gradient means it is decreasing.

Do: In part (e), we found that the gradient of the line is $14.9$14.9, which is a positive value, which means the gradient is increasing. Therefore, the answer is B: If the explanatory variable increases by $1$1 unit, then the dependent variable increases by $14.9$14.9 units.

**Think**: Let's consider what this means for the vocabulary development of a child. The gradient suggests that for every extra page of reading a child does each day, their total vocabulary will increase by $14.9$14.9 words. This change won't happen instantly, and the time frames for this change would need to be investigated in follow up studies, but this does give us a good indication of the relationship between the number of pages a child reads each day and their total vocabulary.

A least squares regression line is given by $y=3.59x+6.72$`y`=3.59`x`+6.72.

State the gradient of the line.

Which of the following is true?

The gradient of the line indicates that the bivariate data set has a positive correlation.

AThe gradient of the line indicates that the bivariate data set has a negative correlation.

BThe gradient of the line indicates that the bivariate data set has a positive correlation.

AThe gradient of the line indicates that the bivariate data set has a negative correlation.

BWhich of the following is true?

If $x$

`x`increases by $1$1 unit, then $y$`y`increases by $3.59$3.59 units.AIf $x$

`x`increases by $1$1 unit, then $y$`y`decreases by $3.59$3.59 units.BIf $x$

`x`increases by $1$1 unit, then $y$`y`decreases by $6.72$6.72 units.CIf $x$

`x`increases by $1$1 unit, then $y$`y`increases by $6.72$6.72 units.DIf $x$

`x`increases by $1$1 unit, then $y$`y`increases by $3.59$3.59 units.AIf $x$

`x`increases by $1$1 unit, then $y$`y`decreases by $3.59$3.59 units.BIf $x$

`x`increases by $1$1 unit, then $y$`y`decreases by $6.72$6.72 units.CIf $x$

`x`increases by $1$1 unit, then $y$`y`increases by $6.72$6.72 units.DState the value of the $y$

`y`-intercept.

The following scattergraph shows the value of various $4$4 bedroom, $2$2 bathroom homes in a new suburb.

State the value of the $y$

`y`-intercept.The $y$

`y`-intercept indicates that when a house is brand new, its value is, on average, $\editable{}$ dollars.Does the interpretation in the previous part make sense in this context?

Yes, when the explanatory variable has a value of zero, this is still within the data range and the value of the dependent variable makes sense.

ANo, when the explanatory variable has a value of zero, this is outside the data range and the value of the dependent variable does not make sense

BYes, when the explanatory variable has a value of zero, this is still within the data range and the value of the dependent variable makes sense.

ANo, when the explanatory variable has a value of zero, this is outside the data range and the value of the dependent variable does not make sense

BUse the $y$

`y`-intercept and the centroid (whose coordinates are given on the graph) to calculate the gradient of the Least Squares Regression Line.Give your answer to one decimal place.

Which of the following is true?

The gradient of the line indicates that the bivariate data set has a positive correlation.

AThe gradient of the line indicates that the bivariate data set has a negative correlation.

BThe gradient of the line indicates that the bivariate data set has a positive correlation.

AThe gradient of the line indicates that the bivariate data set has a negative correlation.

BWhich of the following is true?

If the explanatory variable increases by $1$1 unit, then the dependent variable decreases by $409032.4$409032.4 units.

AIf the explanatory variable increases by $1$1 unit, then the dependent variable increases by $14811$14811 units.

BIf the explanatory variable increases by $1$1 unit, then the dependent variable increases by $409032.4$409032.4 units.

CIf the explanatory variable increases by $1$1 unit, then the dependent variable decreases by $14811$14811 units.

DIf the explanatory variable increases by $1$1 unit, then the dependent variable decreases by $409032.4$409032.4 units.

AIf the explanatory variable increases by $1$1 unit, then the dependent variable increases by $14811$14811 units.

BIf the explanatory variable increases by $1$1 unit, then the dependent variable increases by $409032.4$409032.4 units.

CIf the explanatory variable increases by $1$1 unit, then the dependent variable decreases by $14811$14811 units.

D

The price of various second-hand Mitsubishi Lancers are shown below.

Age | $1$1 | $2$2 | $0$0 | $5$5 | $7$7 | $4$4 | $3$3 | $4$4 | $8$8 | $2$2 |
---|---|---|---|---|---|---|---|---|---|---|

Value (dollars) | $16000$16000 | $13000$13000 | $21990$21990 | $10000$10000 | $8600$8600 | $12500$12500 | $11000$11000 | $11000$11000 | $4500$4500 | $14500$14500 |

Find the equation of the Least Squares Regression Line for the price ($y$

`y`) in terms of age ($x$`x`).Round all values to the nearest integer.

State the value of the $y$

`y`-intercept.The value of the $y$

`y`-intercept indicates that when a Mitsubishi Lancer is brand new, its value is, on average, $\editable{}$ dollars.Does the interpretation in the previous part make sense in this context?

Yes, when the explanatory variable has a value of zero, this is still within the data range and the value of the dependent variable makes sense.

ANo, when the explanatory variable has a value of zero, this is outside the data range and the value of the dependent variable does not make sense.

BANo, when the explanatory variable has a value of zero, this is outside the data range and the value of the dependent variable does not make sense.

BState the gradient of the line.

Which of the following is true?

If the explanatory variable increases by $1$1 unit, then the dependent variable increases by $1694$1694 units.

AIf the explanatory variable increases by $1$1 unit, then the dependent variable increases by $18407$18407 units.

BIf the explanatory variable increases by $1$1 unit, then the dependent variable decreases by $18407$18407 units.

CIf the explanatory variable increases by $1$1 unit, then the dependent variable decreases by $1694$1694 units.

DIf the explanatory variable increases by $1$1 unit, then the dependent variable increases by $1694$1694 units.

AIf the explanatory variable increases by $1$1 unit, then the dependent variable increases by $18407$18407 units.

BIf the explanatory variable increases by $1$1 unit, then the dependent variable decreases by $18407$18407 units.

CIf the explanatory variable increases by $1$1 unit, then the dependent variable decreases by $1694$1694 units.

D

S7-2 Make inferences from surveys and experiments: A making informal predictions, interpolations, and extrapolations B using sample statistics to make point estimates of population parameters C recognising the effect of sample size on the variability of an estimate

Use statistical methods to make an inference