The skill of being able to fit a line to data is a very important one, although, just as important is the ability to be able to interpret the information from the fitted line. This leads us to be able to make important judgments about, and analyse, the original data.
Let's consider an example:
The average number of pages read to a child each day and the child’s growing vocabulary are measured. Some data is given below.
Pages read per day | $25$25 | $27$27 | $29$29 | $3$3 | $13$13 | $31$31 | $18$18 | $29$29 | $29$29 | $5$5 |
---|---|---|---|---|---|---|---|---|---|---|
Total vocabulary | $402$402 | $440$440 | $467$467 | $76$76 | $220$220 | $487$487 | $295$295 | $457$457 | $460$460 | $10$106 |
Think: Before we begin any calculations and interpretations, let's consider what is going on in this question. Somebody wants to know if there is a relationship between the number of pages a child reads each day and their total vocabulary. A child's total vocabulary is an important indicator in a child's overall educational development, especially in regard to their reading.
By looking at the table, we can see that for children who read large amounts of pages each day, like the first three values, $25,27$25,27, and $29$29; their total vocabularies are quite large when compared to those students who read a low number, like $3$3 or $5$5. It seems there will be a positive relationship, but let's work through the rest of the example to find out, and also to see what other information we can gather and interpret.
a) Find the equation of the Least Squares Regression Line for the Total Vocabulary ($y$y) in terms of Pages Read per Day ($x$x).
Round all values to one decimal place.
Think: For this type of question, we need to use some form of technology to help us, as the manual process is very, very long. When using a graphics calculator, or other technology, we need to enter the $x$x values in the first list or column, and the $y$y values in the second list or column.
You can then fit a linear regression to the data using your graphics calculator, or other technology. This will help you find the equation of the Least Squares Regression Line.
Do: After following the appropriate process according to your graphics calculator, or other technology, you should end up with the equation:
$y=14.9x+30.1$y=14.9x+30.1
b) State the $y$y-intercept.
Think: When an equation is of the form:
$y=mx+b$y=mx+b
$m$m is the slope of the line, and $b$b is the $y$y-intercept.
What is the value of $b$b in this case?
Do: In this case, $y=14.9x+30.1$y=14.9x+30.1, and so the $y$y-intercept, which is the number on its own, is equal to $30.1$30.1.
c) The $y$y-intercept indicates that when a child doesn't read any pages each day, then their vocabulary, on average, is _____ words.
Think: The $y$y-intercept occurs when $x=0$x=0.
In this case, the $x$x-variable is total Pages Read per Day.
How can we use this information to determine how many words they have in their vocabulary when they don't read any pages?
As $x=0$x=0, this means Pages Read per Day$=$=$0$0, which means a child doesn't read any pages.
Therefore, the $y$y-intercept is the number of words in their vocabulary when the child doesn't read any pages.
Do: From part (b), the $y$y-intercept is $30.1$30.1. Therefore, the $y$y-intercept indicates that when a child doesn't read any pages each day, then their vocabulary, on average, is $30.1$30.1 words.
d) Does the interpretation in the previous part make sense in this context? We have the following two options to consider.
A: No, when the independent variable has a value of zero, this is outside the data range and the value of the dependent variable does not make sense.
B: Yes, when the independent variable has a value of zero, this is still within the data range and the value of the dependent variable makes sense.
Think: In this case, the independent variable is Pages Read per Day.
Does a value of Pages Read per Day$=$=$0$0 lie within the data range?
Does a value of Pages Read per Day$=$=$0$0 make sense?
Do: A value of Pages Read per Day$=$=$0$0 does lie within the data range and does make sense, because a child could read $0$0 pages per day, which means they do no reading practice at all. Therefore, the answer is B: Yes, when the independent variable has a value of zero, this is still within the data range and the value of the dependent variable makes sense.
e) State the slope of the line.
Think: When an equation is of the form:
$y=mx+b$y=mx+b
$m$m is the slope of the line, and $b$b is the $y$y-intercept.
What is the value of $m$m in this case?
Do: In this case, $y=14.9x+30.1$y=14.9x+30.1, and so the slope, which is the number next to the $x$x, is equal to $14.9$14.9.
f) Which of the following is true?
A: The slope of the line indicates that the bivariate data set has a negative correlation. OR
B: The slope of the line indicates that the bivariate data set has a positive correlation.
Think: The sign of the slope, positive or negative, tells us whether the bivariate data has a positive or negative correlation.
Is this slope positive or negative?
In part (e), we found that the slope of the line is $14.9$14.9, which is a positive value.
Therefore, is the correlation positive or negative?
Remember:
Positive Slope $=$= Positive Correlation
Negative Slope $=$= Negative Correlation
Do: As we have a positive slope, this means we have a positive correlation. Therefore, the answer is B: The slope of the line indicates that the bivariate data set has a positive correlation.
g) Which of the following is true? (we have 4 options to consider)
A: If the independent variable increases by $1$1 unit, then the dependent variable increases by $30.1$30.1 units.
B: If the independent variable increases by $1$1 unit, then the dependent variable increases by $14.9$14.9 units.
C: If the independent variable increases by $1$1 unit, then the dependent variable decreases by $14.9$14.9 units.
D: If the independent variable increases by $1$1 unit, then the dependent variable decreases by $30.1$30.1 units.
Think: The slope tells us how many units the dependent variable increases for every $1$1 unit of the independent variable.
Is the slope increasing or decreasing?
Remember:
Positive slope means it is increasing.
Negative slope means it is decreasing.
Do: In part (e), we found that the slope of the line is $14.9$14.9, which is a positive value, which means the slope is increasing. Therefore, the answer is B: If the independent variable increases by $1$1 unit, then the dependent variable increases by $14.9$14.9 units.
Think: Let's consider what this means for the vocabulary development of a child. The slope suggests that for every extra page of reading a child does each day, their total vocabulary will increase by $14.9$14.9 words. This change won't happen instantly, and the time frames for this change would need to be investigated in follow up studies, but this does give us a good indication of the relationship between the number of pages a child reads each day and their total vocabulary.
A least squares regression line is given by $y=3.59x+6.72$y=3.59x+6.72.
State the slope of the line.
Which of the following is true?
The slope of the line indicates that the bivariate data set has a positive correlation.
The slope of the line indicates that the bivariate data set has a negative correlation.
Which of the following is true?
If $x$x increases by $1$1 unit, then $y$y increases by $3.59$3.59 units.
If $x$x increases by $1$1 unit, then $y$y decreases by $3.59$3.59 units.
If $x$x increases by $1$1 unit, then $y$y decreases by $6.72$6.72 units.
If $x$x increases by $1$1 unit, then $y$y increases by $6.72$6.72 units.
State the value of the $y$y-intercept.
The following scattergraph shows the value of various $4$4 bedroom, $2$2 bathroom homes in a new suburb.
State the value of the $y$y-intercept.
The $y$y-intercept indicates that when a house is brand new, its value is, on average, $\editable{}$ dollars.
Does the interpretation in the previous part make sense in this context?
Yes, when the independent variable has a value of zero, this is still within the data range and the value of the dependent variable makes sense.
No, when the independent variable has a value of zero, this is outside the data range and the value of the dependent variable does not make sense
Use the $y$y-intercept and the centroid (whose coordinates are given on the graph) to calculate the slope of the Least Squares Regression Line.
Give your answer to one decimal place.
Which of the following is true?
The slope of the line indicates that the bivariate data set has a positive correlation.
The slope of the line indicates that the bivariate data set has a negative correlation.
Which of the following is true?
If the independent variable increases by $1$1 unit, then the dependent variable decreases by $409032.4$409032.4 units.
If the independent variable increases by $1$1 unit, then the dependent variable increases by $14811$14811 units.
If the independent variable increases by $1$1 unit, then the dependent variable increases by $409032.4$409032.4 units.
If the independent variable increases by $1$1 unit, then the dependent variable decreases by $14811$14811 units.
The price of various second-hand Mitsubishi Lancers are shown below.
Age | $1$1 | $2$2 | $0$0 | $5$5 | $7$7 | $4$4 | $3$3 | $4$4 | $8$8 | $2$2 |
---|---|---|---|---|---|---|---|---|---|---|
Value (dollars) | $16000$16000 | $13000$13000 | $21990$21990 | $10000$10000 | $8600$8600 | $12500$12500 | $11000$11000 | $11000$11000 | $4500$4500 | $14500$14500 |
Find the equation of the Least Squares Regression Line for the price ($y$y) in terms of age ($x$x).
Round all values to the nearest integer.
State the value of the $y$y-intercept.
The value of the $y$y-intercept indicates that when a Mitsubishi Lancer is brand new, its value is, on average, $\editable{}$ dollars.
Does the interpretation in the previous part make sense in this context?
Yes, when the independent variable has a value of zero, this is still within the data range and the value of the dependent variable makes sense.
No, when the independent variable has a value of zero, this is outside the data range and the value of the dependent variable does not make sense.
State the slope of the line.
Which of the following is true?
If the independent variable increases by $1$1 unit, then the dependent variable increases by $1694$1694 units.
If the independent variable increases by $1$1 unit, then the dependent variable increases by $18407$18407 units.
If the independent variable increases by $1$1 unit, then the dependent variable decreases by $18407$18407 units.
If the independent variable increases by $1$1 unit, then the dependent variable decreases by $1694$1694 units.