topic badge

2.6 Competing function model validation

Lesson

Introduction

Learning objectives

  • 2.6.A Construct linear, quadratic, and exponential models based on a data set.
  • 2.6.B Validate a model constructed from a data set.

Constructing linear, quadratic, and exponential models

When determining whether a linear or exponential model is a better fit for a given scenario, consider:

  1. How the function changes for each unit of x. Remember, for a linear function, the same amount will be added (or subtracted) to each output. For an exponential function, each output will be multiplied by the same number.
  2. If the trend is likely to continue. Consider that a decreasing linear function will eventually be negative while exponential decay will only get closer to 0. For growth, an exponential function will eventually grow rapidly, resulting in increasingly larger increases.

Exploration

Decide whether you think the function is linear then drag the slider to investigate further. Try again by clicking "Try a new function."

Loading interactive...
  1. What did you notice as you dragged the slider?

Exponential functions can be difficult to identify from a graph or table when the amount of change is small relative to the size of the numbers.

A quadratic function is a polynomial function of degree 2. A quadratic function can be written in the form f(x)=ax^2+bx+c where a, b, and c are real numbers.

From the graph of a quadratic function, called a parabola, we can identify key features including domain and range, x- and y-intercepts, increasing and decreasing intervals, positive and negative intervals, average rate of change, and end behavior. The parabola also has the following two features that help us identify it, and that we can use when drawing the graph:

Axis of symmetry

A line that divides a figure into two parts, such that the reflection of either part across the line maps precisely onto the other part. For a parabola, the axis of symmetry is a vertical line passing through the vertex.

x
y
Vertex

The point where the parabola crosses the axis of symmetry. The vertex is either a maximum or minimum on the parabola.

x
y

Examples

Example 1

Determine whether an exponential or linear model would better model the data. Justify your choice.

a

A real estate agent earns 3\% of the value of every house sold.

Worked Solution
Create a strategy

Creating a model will help determine if this situation is linear or exponential.

Apply the idea

Let x represent the value of the house sold. Then, the amount earned by the real estate agent can be modeled with f(x)=0.03x.

This situation is better modeled with a linear function because it has a constant rate of change.

Reflect and check

We can calculate the amount the agent earns by finding 3\% of the house value. If we put a few different house values in a table, we get

House value\$100\,000\$500\,000\$1\,000\,000
Earnings\$3\,000\$15\,000\$30\,000

which shows us that the agent earns \$3000 for every \$500\,000 of house sold. This is a linear rate of earnings.

We can also graph the earnings dependent on house value to see that they form a line:

Agent earnings
250000
500000
750000
1000000
\text{House value}
5000
10000
15000
20000
25000
30000
\text{Earnings}
b

The average median house price of a home, y, sold in the U.S. from 2019 to the beginning of 2022 is shown in the graph where x represents the number of years since 2019.

Median U.S. house price since 2019
0.5
1
1.5
2
2.5
3
x
325000
350000
375000
400000
425000
450000
475000
500000
525000
y
Worked Solution
Create a strategy

We're given this context on a graph so we can consider what it would look like to draw a line versus a curve through the points.

Apply the idea

The graph appears to be curved similarly to exponential growth, so we will choose exponential growth for our model.

The small amount of change in 2019 is similar to the horizontal asymptote of an exponential function with a vertical translation of about 375\,000. If we draw an exponential curve through the data, it might look like this:

Median U.S. house price since 2019
0.5
1
1.5
2
2.5
3
x
325000
350000
375000
400000
425000
450000
475000
500000
525000
y
Reflect and check

To verify that a linear function is not a better fit, we can see that a line would not be near a majority of the points:

Median U.S. house price since 2019
0.5
1
1.5
2
2.5
3
x
325000
350000
375000
400000
425000
450000
475000
500000
525000
y

Example 2

The cost of college tuition in the United States has increased by 1200\% since 1980. Consider the average annual tuition and fees presented in the table:

YearPublic universityPrivate university
1980\$1\,856\$10\,227
1990\$2\,750\$16\,590
2000\$3\,706\$21\,698
2010\$5\,814\$25\,250
2020\$9\,403\$34\,059
a

Create a linear model to represent the average annual tuition cost for each type of university.

Worked Solution
Create a strategy

A linear model will have a constant slope. We can find the average rate of change from 1980 to 2020 by using the average rate of change formula f(x)=\dfrac{f(b)-f(a)}{b-a}.

Apply the idea

The average rate of change from 1980 to 2020 for public university average annual tuition was \dfrac{9403-1856}{2020-1980}=\$188.68 If we let x represent the years since 1980, we can use the slope-intercept form to get: y=188.68x+1856

Public university cost
5
10
15
20
25
30
35
40
\text{Years since 1980}
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
\text{Tuition cost}

We can see from the graph that this model overestimates the tuition from 1990 to 2010.

The average rate of change from 1980 to 2020 for private university average annual tuition was: \dfrac{34\,059-10\,227}{2020-1980}=\$595.8 If we let x represent the years since 1980, we can use the slope-intercept form to get: y=595.8x+10\,227

Private university cost
5
10
15
20
25
30
35
40
\text{Years since 1980}
10000
15000
20000
25000
30000
35000
40000
\text{Tuition cost}

We can see from the graph that this model is a pretty good fit except for in 2000.

Reflect and check

A better model for public university tuition and fees might be the line of best fit: f(x)=181.58+1074.2, which has a similar slope, but a lower y-intercept

Public university cost
5
10
15
20
25
30
35
40
x
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
y
b

Create an exponential model to represent the average annual tuition cost for each type of university.

Worked Solution
Create a strategy

An exponential model in the form f(x)=ab^x has an initial value and a growth factor. We can find the growth factor by finding the ratio of tuition costs for two different points.

Apply the idea

The growth factor from 1980 to 2020 for public university average annual tuition was \dfrac{9403}{1856}=5.066 over a span of 40 years. We can use properties of exponents to find the annual growth factor for our model: (5.066)^\frac{1}{40}=1.041. This tells us that the tuition increased an average of 4.1\% each year from 1980 to 2020. If we let x represent the years since 1980, we can create an exponential model:y=1856(1.041)^x

Public university cost
5
10
15
20
25
30
35
40
\text{Years since 1980}
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
\text{Tuition cost}

We can see from the graph that this model closely follows the data.

The growth factor from 1980 to 2020 for private university average annual tuition was: \dfrac{34\,059}{10\,227}=3.330 We can use properties of exponents to find the annual growth factor for our model: (3.330)^\frac{1}{40}=1.031. This tells us that the tuition increased an average of 3.1\% each year from 1980 to 2020. If we let x represent the years since 1980, we can create an exponential model:y=10\,227(1.031)^x

Private university cost
5
10
15
20
25
30
35
40
\text{Years since 1980}
10000
15000
20000
25000
30000
35000
40000
\text{Tuition cost}

We can see from the graph that this model underestimates tuition in 1990 and 2000.

c

Determine if a linear model or an exponential model would be better to predict tuition for the next decade. Explain the differences between the models and what they tell us about the context.

Worked Solution
Create a strategy

We have linear models in part (a) and exponential models in part (b). We should choose our model based on how well they fit the current data and if we think the trend will continue into the next decade.

Apply the idea

The exponential model looked to be a better fit for public university tuition, and the linear model looked to be a better fit for private university tuition.

Using y=1856(1.041)^x, to predict public school tuition in the next ten years, we find that the average annual public university tuition would be about y=1856(1.041)^{50}=\$13\,839.31.

Using y=595.8x+10\,227, to predict private school tuition in the next ten years, we find that the average annual private university tuition would be 595.8(50)+10\,227=\$40\,017.

The change in exponential models increases over time, but linear models stay consistent. Since the private university tuition had a more linear growth, we might expect their tuition to continue that trend. The public university tuition experienced larger and larger increases each decade which is more consistent with exponential growth. If these trends continued, we would see public university tuition catch up to private university tuition.

Reflect and check

It's more likely that private university tuition will increase exponentially as the cost of goods also increases by a percent each year. However, if the trends continue the way we've seen them, here's what tuition would like from 1980 to 2080:

University cost
15
30
45
60
75
90
\text{Years since 1980}
10000
20000
30000
40000
50000
60000
70000
\text{Tuition cost}

Example 3

Consider the quadratic function: f(x)=x^2-2x+1

a

Graph the function.

Worked Solution
Create a strategy

We can create a table of values that satisfy f(x) and use it to help graph the function. It can be useful to choose values for x that are positive and negative, as well as x=0:

x-2-101234
f(x)

To complete the table, evaluate the function for each x-value.

Here is how we can obtain f(-2):

f(-2)=(-2)^2-2(-2)+1

f(-2)=4+4+1

f(-2)=9

Repeat this process for each x-value in the table.

x-2-101234
f(x)9410149

Now we can use these points to graph the quadratic function f(x)

Apply the idea
-2
-1
1
2
3
4
x
1
2
3
4
5
6
7
8
9
y
Reflect and check

Having the vertex in your table is useful, since it tells you where the parabola will change direction. Sometimes the table values you select will not include the vertex of the function, depending on the quadratic function being graphed. If you plot your initial table values and find you are unsure where the parabola changes direction, you can add additional values to your table until you can identify where f(x) changes direction.

Note that the quadratic function has one x-intercept, at x=1.

b

State the axis of symmetry.

Worked Solution
Create a strategy

The axis of symmetry passes through the point where the y-values change from decreasing to increasing.

Apply the idea
-2
-1
1
2
3
4
x
1
2
3
4
5
6
7
8
9
f(x)

The axis of symmetry is x=1.

Idea summary

Linear functions can help model relations with a near-constant of change, and exponential functions can help model relations with an increasing or decreasing rate of change.

Comparing models

Exploration

Consider the table below:

xy=3xy=3x^2y=3^x
1333
26129
392727
515125243
  1. Compare the three functions and how they change as x increases.

The way a function is represented can affect the characteristics we are able to identify for the function. Different representations can highlight or hide certain characteristics. Remember that key features of functions include:

  • domain and range
  • x- and y-intercepts
  • maximum or minimum value(s)
  • average rate of change over various intervals
  • end behavior
  • positive and negative intervals
  • increasing and decreasing intervals
  • asymptote(s)
  • vertex
  • axis of symmetry

One way to compare functions is to look at growth rates as the x-values increase over regular intervals. In order to compare the growth rates of quadratics with those of exponential or linear functions, we will examine only the increasing interval of a quadratic function.

When the leading coefficient of the quadratic equation is positive, the parabola opens upward. In this case, we know y increases at an increasing rate as x approaches infinity.

Since a linear function increases at a constant rate and the quadratic function increases at an increasing rate as x increases, eventually the quadratic function will increase faster than the linear function.

Next, we need to examine how an exponential growth function compares to the increasing portion of the quadratic function, since both functions increase at an increasing rate. Consider a situation where we compare the increasing interval of the quadratic function g(x) with a positive leading coefficient, to an exponential growth function h(x), as shown in the graph.

-2
-1
1
2
3
4
5
6
x
10
20
30
y

Notice starting at x=0, g(x) is greater than h(x) and is increasing at a greater rate. But, as x continues to increase, the quadratic function g(x) is increasing at a slower rate than the exponential function, and eventually the exponential function will overtake the quadratic function.

An exponential growth function will always exceed a linear or quadratic growth function as values of x become larger.

Examples

Example 4

Consider the functions shown below. Assume that the domain of f is all real numbers.

  • Function 1:

    x-1012345
    f\left(x\right)-3.75-2-0.251.53.2556.75
  • Function 2:

    -8
    -6
    -4
    -2
    2
    4
    6
    8
    x
    -4
    -2
    2
    4
    6
    8
    10
    12
    y
a

Determine which function has a higher y-intercept.

Worked Solution
Create a strategy

Remember that the y-intercept of a function occurs when x=0. We can use this to evaluate the y-intercept of f and identify the y-intercept of g.

Apply the idea

For f, we can see from the table that f\left(0\right) = -2.

For g, we can see from the graph that g\left(0\right) = -3.

So the y-intercept of f is the point \left(0, -2\right) and the y-intercept of g is the point \left(0, -3\right), and therefore f has a higher y-intercept.

b

Find the average rate of change for each function over the following intervals:

  • 0 \leq x \leq 1
  • 1 \leq x \leq 4
  • 4 \leq x \leq 5
Worked Solution
Create a strategy

For Function 1, we can find the values of f\left(0\right), f\left(1\right), f\left(4\right), and f\left(5\right) from the table of values. For Function 2 we will need to look at the graph and estimate the values of g\left(0\right), g\left(1\right), g\left(4\right), and g\left(5\right).

Apply the idea

First, we can consider the interval 0 \leq x \leq 1.

For Function 1, using the table of values, we can see that f\left(0\right)=-2 and f\left(1\right)=-0.25.

So, the average rate of change of Function 1 over 0 \leq x \leq 1 is:\dfrac{f(1)-f(0)}{1-0}=\dfrac{-0.25-\left(-2\right)}{1-0}=\dfrac{1.75}{1}=1.75

For Function 2, using the graph, we can see that g\left(0\right)=-3 and g\left(1\right)=-4.

So, the average rate of change of Function 2 over 0 \leq x \leq 1 is:\dfrac{g(1)-g(0)}{1-0}=\dfrac{-4-\left(-3\right)}{1-0}=-\dfrac{1}{1}=-1

Next, we can consider the interval 1 \leq x \leq 4.

For Function 1, using the table of values, we can see that f\left(1\right)=-0.25 and f\left(4\right)=5.

So, the average rate of change of Function 1 over 1 \leq x \leq 4 is:\dfrac{f(4)-f(1)}{4-1}=\dfrac{5-\left(-0.25\right)}{4-1}=\dfrac{5.25}{3}=1.75

For Function 2, using the graph, we can see that g\left(1\right)=-4 and g\left(4\right)=5.

So, the average rate of change of Function 2 over 1 \leq x \leq 4 is:\dfrac{g(4)-g(1)}{4-1}=\dfrac{5-\left(-4\right)}{4-1}=\dfrac{9}{3}=3

Lastly, we can consider the interval 4 \leq x \leq 5.

For Function 1, using the table of values, we can see that f\left(4\right)=5 and f\left(5\right)=6.75.

So, the average rate of change of Function 1 over 4 \leq x \leq 5 is:\dfrac{f(5)-f(4)}{5-4}=\dfrac{6.75-5}{5-4}=\dfrac{1.75}{1}=1.75

For Function 2, using the graph, we can see that g\left(4\right)=5 and g\left(5\right)=12.

So, the average rate of change of Function 2 over 4 \leq x \leq 5 is:\dfrac{g(5)-g(4)}{5-4}=\dfrac{12-5}{5-4}=\dfrac{7}{1}=7

Reflect and check

We can notice from the table of values that f\left(x\right) is increasing at a constant rate. This means that regardless of the interval we consider, the average rate of change remains the same.

c

Using part (b), determine which function will be greater as x approaches positive infinity.

Worked Solution
Create a strategy

We can consider the average rate of change calculated in part (b) to determine which function will be greater for large values of x.

Apply the idea

From part (b), we saw that f\left(x\right) was a linear function with a constant rate of change of 1.75. We calculated that g\left(x\right) had an increasing rate of change as x increased. So, we can see that Function 2 will be far greater than Function 1 as x approaches positive infinity.

Reflect and check

A concave up quadratic function will grow faster than a linear function as x approaches positive infinity.

Example 5

Consider functions representing three options to earn money one of the following ways:

A figure showing 3 options to earn money. Option 1 shows the statement: You are given 2 dollars each day. Option 2 shows a table with 2 columns titled Days and Total Amount and with 6 rows. The data is as follows: First column: 1, 2, 3, 4, 5,6; Second column: 1 dollar, 4 dollars, 9 dollars, 16 dollars, 25 dollars, 36 dollars. Option 3 shows a first quadrant coordinate plane with the x axis labeled Days and the y axis labeled Total Amount in dollars. The points (1, 2), (2, 4), (3, 8), (4, 16), and (5, 32) are plotted on the graph.

Note: Option 3 starts with \$2 on day one and doubles each day after this.

a

Compare the average rate of change of each function over the intervals 2 \leq x \leq 3 and 4 \leq x \leq 5.

Worked Solution
Create a strategy

To find the average rate of change from a given function over the interval a \leq x \leq b, we can find the change in the value of the dependent variable f(b)-f(a) per change in value in the independent variable b-a, or:

\dfrac{f(b)-f(a)}{b-a}

for each of the options.

Apply the idea

For Option 1, f(2)=2 \cdot 2 = 4 and f(3)=2 \cdot 3=6.

The average rate of change for Option 1 over the interval 2 \leq x \leq 3 is \dfrac{f(3)-f(2)}{3-2} = \dfrac{6-4}{4-3} = \dfrac{2}{1} = 2

For Option 2, f(2)=4 and f(3)=9.

The average rate of change for Option 2 over the interval 2 \leq x \leq 3 is \dfrac{f(3)-f(2)}{3-2} = \dfrac{9-4}{4-3} = \dfrac{5}{1} = 5

For Option 3, f(2)=4 and f(3)=8.

The average rate of change for Option 3 over the interval 2 \leq x \leq 3 is \dfrac{f(3)-f(2)}{3-2} = \dfrac{8-4}{4-3} = \dfrac{4}{1} = 4

Option 2 has the greatest average rate of change over 2 \leq x \leq 3, at \$5 per day.

For Option 1, f(4)=2 \cdot 4 = 8 and f(5)=2 \cdot 5=10.

The average rate of change for Option 1 over the interval 4 \leq x \leq 5 is \dfrac{f(5)-f(4)}{5-4} = \dfrac{10-8}{5-4} = \dfrac{2}{1} = 2

For Option 2, f(4)=16 and f(5)=25.

The average rate of change for Option 2 over the interval 4 \leq x \leq 5 is \dfrac{f(5)-f(4)}{5-4} = \dfrac{25-16}{5-4} = \dfrac{9}{1} = 9

For Option 3, f(4)=16 and f(5)=32.

The average rate of change for Option 3 over the interval 4 \leq x \leq 5 is \dfrac{f(5)-f(4)}{5-4} = \dfrac{32-16}{5-4} = \dfrac{16}{1} = 16

The average rate of change of Option 1 remained \$2 per day over both intervals, while the average rates of change of Options 2 and 3 both increased from the first to the second intervals. Option 3 has the greatest rate of change over 4 \leq x \leq 5, at \$16 per day.

b

Find the equation that represents each option, where x is the number of days that have passed.

Worked Solution
Create a strategy

For each option, we can consider how the total amount of money changes as the days progress and derive an equation to represent the relationship.

Apply the idea

For Option 1, we saw in part (a) that we had a constant rate of change regardless of the interval we considered. So, Option 1 can be represented by the linear function, f\left(x\right)=2x.

Now, observing the table of values for Option 2, we can see that the total amount is just the square of the number of days passed. So, Option 2 can be represented by the function f\left(x\right)=x^2.

Finally, the relationship for Option 3 is represented in the graph, but also described to us. Since we are told that the function starts at \$2 and is doubled each day, we can see that Option 3 is just represented by the function f\left(x\right)=2^x.

Reflect and check

If the relationship between the days passed and the total amount weren't directly obvious in Option 2, we could have tested the data provided in the table to rule out a linear or exponential relationship.

For a linear relationship, the average rate of change between any two points must be equal. We already discovered in part (a) that this wasn't true for Option 2. So, we could have then tested if it represented an exponential relationship.

For an exponential relationship, the ratio of between two points, a unit apart, must be equal. We can see that for Option 2, \dfrac{4}{1} \neq \dfrac{9}{4}.

Therefore, we could see that Option 2 represented neither a linear or exponential relationship.

c

Find the value of each option at 8 days, 12 days, and 14 days.

Worked Solution
Create a strategy

Construct a table of values with the amounts of money gained with each option.

Apply the idea
DaysOption 1 TotalOption 2 TotalOption 3 Total
1\$2\$1\$2
2\$4\$4\$4
3\$6\$9\$8
4\$8\$16\$16
5\$10\$25\$32
6\$12\$36\$64
7\$14\$49\$128
8\$16\$64\$256
9\$18\$81\$512
10\$20\$100\$1\,024
11\$22\$121\$2\,048
12\$24\$144\$4\, 096
13\$26\$169\$8\,192
14\$28\$196\$16 \,384

At 8 days, Option 1 will make \$16, Option 2 will make \$64, and Option 3 will make \$256.

At 12 days, Option 1 will make \$24, Option 2 will make \$144, and Option 3 will make \$4\,096.

At 14 days, Option 1 will make \$28, Option 2 will make \$196, and Option 3 will make \$16\,384.

Reflect and check

We could calculate the total amount of money on days 8, 12 and 14 using the functions found in part (b), instead of constructing a table.

d

Determine which option will be greater for larger and larger values of x.

Worked Solution
Create a strategy

Use the table comparison from part (b) to determine which option will be greater for larger and larger values of x.

Apply the idea

As x gets larger and larger, we can see that Option 3, the exponential option, will be far greater than Options 1 or 2.

Reflect and check

An exponential function will always exceed a linear or quadratic function as values of x become larger.

Idea summary

It is important to be able to compare the key features of functions whether they are represented in similar or different ways:

  • domain and range
  • x- and y-intercepts
  • maximum or minimum value(s)
  • average rate of change
  • end behavior
  • positive and negative intervals
  • increasing and decreasing intervals
  • asymptote(s)
  • vertex
  • axis of symmetry

Fitting functions to data

Exploration

Use the linear and exponential models to fit the data on the graph.

Loading interactive...
  1. Which function fits the data better? How do you know?

Sometimes we need to consider fitting something other than a line of best fit or regression line, which both refer to a linear regression model, to model data. A fitted function could include another type of function, such as an exponential function.

We already learned about the correlation coefficient, r, a statistic that describes both the strength and direction of a linear association. But, we also need a measure that determines how well our fitted function can actually predict an outcome.

This value is known as the coefficient of determination, or the value R^2, and is a measure of the proportion of the variation in the dependent variable that is predicted by the independent variable. Since the coefficient of determination represents a proportion it will only ever return a value between 0 and 1.

In the case where the fitted function is a linear model with one independent variable, R^2 is equal to our correlation coefficient squared, r^2. This only holds true for linear models with one independent variable and is not the case when the fitted function is from any other function family, e.g. exponential, quadratic, etc.

Coefficient of determination

A measurement used to explain how much the variability of one quantity can be explained by its relationship to another quantity

Bivariate data can be modeled with a fitted function also called a regression model. Depending on the goodness of fit, measured with the coefficient of determination (R^2), a regression function may pass exactly through all of the points, some of the points, or none of the points.

0.1
0.2
0.3
0.4
0.5
x
10000
20000
30000
40000
50000
y

\text{ Exponential regression } R^2=0.763

R^2 for the exponential regression shown means that 76.3\% of the variation in the dependent variable is explained by the variation in the independent variable. The closer R^2 is to 1, the more that the variation in the dependent variable is explained by the variation in the independent variable.

Examples

Example 6

A teacher recorded the number of days since a student last studied for an exam and their score out of a possible 80 points on the exam.

Number days since studying3264416342
Exam score64594257587233635562
a

Describe the association between the number of days since student and the exam score.

Worked Solution
Create a strategy

Construct a scatterplot to get a visual of the data.

1
2
3
4
5
6
\text{Days since studying}
10
20
30
40
50
60
70
80
\text{Score }

Then consider the form, strength, and direction.

Apply the idea

The data appears to have a strong, negative, linear association.

b

Calculate the line of best fit and correlation coefficient. Interpret the correlation coefficient.

Worked Solution
Create a strategy
  1. Enter the x- and y-values in two separate columns:

    A screenshot of the GeoGebra statistics tool showing the numbers 3, 2, 6, 4, 4, 1, 6, 3, 4, and 2 entered in column A, rows 1 to 10 and the numbers 64, 59, 42, 57, 58, 72, 33, 63, 55, and 62 entered in column B, rows 1 to 10. Speak to your teacher for more details.
  2. Highlight the data and select Two Variable Regression Analysis:

    A screenshot of the GeoGebra statistics tool showing the numbers 3, 2, 6, 4, 4, 1, 6, 3, 4, and 2 in column A, rows 1 to 10 and the numbers 64, 59, 42, 57, 58, 72, 33, 63, 55, and 62 in column B, rows 1 to 10. The cells from column A, rows 1 to 10, and column B, rows 1 to 10, are selected. The menu from the second leftmost icon is shown. Speak to your teacher for more details.
  3. Select Show Statistics to see the correlation coefficient, r:

    A screenshot of the GeoGebra statistics tool showing the following: On the left side: the numbers 3, 2, 6, 4, 4, 1, 6, 3, 4, and 2 in column A, rows 1 to 10 and the numbers 64, 59, 42, 57, 58, 72, 33, 63, 55, and 62 in column B, rows 1 to 10. The cells from column A, rows 1 to 10, and column B, rows 1 to 10, are selected. On the right side: a list of statistical values is shown. Speak to your teacher for more details.
  4. Choose Linear under the Regression Model drop down menu to find the line of best fit:

    A screenshot of the GeoGebra statistics tool showing the following: On the left side: the numbers 3, 2, 6, 4, 4, 1, 6, 3, 4, and 2 in column A, rows 1 to 10 and the numbers 64, 59, 42, 57, 58, 72, 33, 63, 55, and 62 in column B, rows 1 to 10. The cells from column A, rows 1 to 10, and column B, rows 1 to 10, are selected. On the right side: a scatterplot and the line of best fit are shown. Speak to your teacher for more details.
Apply the idea

The equation of the line of best fit is y=-6.22x+78

The correlation coefficient is r=-0.9115, meaning that there is a strong negative correlation between the number of days since a student last studied and their score on the exam.

Reflect and check

The correlation coefficient provides statistical evidence for the association.

c

Interpret the meaning of the slope and y-intercept of the line of best fit in context of the data.

Worked Solution
Create a strategy

From part (b) we know that the equation of the line of best fit is y=-6.22x+78 which tells us the slope is -6.22 and the y-intercept is 78.

Apply the idea

The slope of -6.22 represents the driving score dropping by -6.22 points each day gone without studying.

The y-intercept tells us that a studen who has studied with 0 days to the exam has a predicted score of 78 according to the linear model.

Reflect and check

Matching the slope and the y-intercept to their respective units is a good strategy for interpreting their meaning in context. \text{slope}=\dfrac{\text{rise}}{\text{run}}=\dfrac{-6.22}{1}

The quantity on the y-axis represents the "rise" and the quantity on the x-axis represents the "run". So the slope represents negative 6.22 score for every 1 day.

The y-intercept can be written as an ordered pair \left(x,y\right)=\left(0,78\right) where x is the number of days since studying and y is the exam score.

Example 7

The population P of fish in a small lake over t years is shown in the table below:

Years (t)Fish Population (P)
01000
0.5550
1500
1.5425
1.75350
2290
2.25210
2.5160
3.75100
a

Determine whether a linear or exponential model best fits the relationship between the years, t, and the population of fish P.

Worked Solution
Create a strategy

Plot the data on a coordinate plane and examine the shape of the data.

Lake Fish Population
1
2
3
4
\text{Years }t
100
200
300
400
500
600
700
800
900
1000
\text{Population }P
Apply the idea

An exponential model would best fit the data after examining the plot.

b

Calculate the regression model for the data and use it to predict the population of fish in the lake after 5 years.

Worked Solution
Create a strategy

Use technology to calculate the exponential regression, then use the equation to determine the population when t=5.

  1. Enter the x- and y-values in two separate columns, then highlight the data and select Two Variable Regression Analysis :

    A screenshot of the GeoGebra statistics tool showing the numbers 0, 0.5, 1, 1.5, 1.75, 2, 2.25, 2.5, and 3.75 entered in column A, rows 1 to 9 and the numbers 1000, 550, 500, 425, 350, 290, 210, 160, and 100 entered in column B, rows 1 to 9. The cells from column A, rows 1 to 9, and column B, rows 1 to 9, are selected. The menu from the second leftmost icon is shown. Speak to your teacher for more details.
  2. Choose Exponential under the Regression Model drop down menu to find the line of best fit:

    A screenshot of the GeoGebra statistics tool showing the following: On the left side: the numbers 0, 0.5, 1, 1.5, 1.75, 2, 2.25, 2.5, and 3.75 in column A, rows 1 to 9 and the numbers 1000, 550, 500, 425, 350, 290, 210, 160, and 100 in column B, rows 1 to 9. The cells from column A, rows 1 to 9, and column B, rows 1 to 9, are selected. On the right side: a scatterplot and the best fit curve are shown. Speak to your teacher for more details.
Apply the idea

The function that fits the model best is P=910.9e^{-0.61t}

If t=5, then P=910.9e^{-0.61 \cdot 5}=43.14. This means that according to the regression model, we can expect there to be about 43 fish remaining in the lake after 5 years.

c

Interpret the coefficient of determination for the regression model.

Worked Solution
Create a strategy

From the calculated regression model in the calculator, select Show Statistics to see the coefficient of determination, R^2:

A screenshot of the GeoGebra statistics tool showing the following: On the left side: the numbers 0, 0.5, 1, 1.5, 1.75, 2, 2.25, 2.5, and 3.75 in column A, rows 1 to 9 and the numbers 1000, 550, 500, 425, 350, 290, 210, 160, and 100 in column B, rows 1 to 9. On the middle: a list of statistical values is shown. On the right side: a scatterplot and the best fit curve are shown. Speak to your teacher for more details.
Apply the idea

The coefficient of determination is 0.9491, meaning that 94.91 \% of the variation in the fish population is explained by the variation in the year that the measurement was taken.

Reflect and check

In our statistics table we can see we produced a value for the correlation coefficient r and the coefficient of determination R^2.

From our statistics table we can see that if we take the square of r, we get r^2 is equal to 0.8268. However, in the statistics table R^2 equals 0.9491. These values are not equal since our fitted function is an exponential function and not a linear function.

If we instead selected Linear under the Regression Model drop down, we would obtain the same value for the correlation coefficient since it is a measure of linear association regardless of the function type we choose, however our value for R^2 would instead have been 0.8268. Notice that this value is equal to r^2 in the linear case.

Idea summary

Linear and exponential data can be fitted to a regression model. We can analyze the closeness of the fit using the coefficient of determination.

Analyzing fitted functions

Exploration

Consider the scatter plot of data relating the number of guests at a restaurant and the cost of the meal and the residual plot of the data:

Cost of meal vs number of guests
1
2
3
4
5
6
7
8
9
10
\text{Number of guests }x
10
20
30
40
50
60
70
80
90
100
110
120
\text{Cost (in dollars) }y
Line of best fit: y=12.07x+0.04
Residual plot
1
2
3
4
5
6
7
8
9
10
x
-12
-10
-8
-6
-4
-2
2
4
6
8
10
12
\text{Residual } y
Plot of residuals for the 'Cost of meal vs number of guests' graph
  1. Compare the points on the scatter plot with the points on the residual graph. What do you notice about the relationship of the points?

From a scatter plot and a line of fit, we can further analyze an association between two variables by examining the residuals of the model.

Residual

The residual value is the difference between the actual output of x and the predicted output value of x calculated using the line of best fit.

\text{residual}=\text{actual}-\text{predicted}

18
19
20
21
22
x
55
60
65
70
75
80
y

By taking the residuals of each point in the data set and plotting them at their corresponding x-values, we form a residual plot for the data.

The residual plot is constructed using the same x-axis scale and x-coordinates from the original scatter plot, and plotting the residual values as the y-coordinates.

A residual plot can be used to decide if a straight line is an appropriate model for the data. And, it identifies the strength of the relationship by showing how much the model over-predicts (negative residual) and under-predicts (positive residual) the actual data. Looking for unusually large residuals can help us identify outliers in the data set.

Two key features will help provide evidence about whether or not a linear model is appropriate, and indicate the strength of the relationship:

  • Pattern - if the linear model fitted is appropriate, then points on the residual plot should be randomly scattered about the x-axis without a noticeable pattern.

  • Size of residuals - residuals that are small in size relative to the data being predicted indicate a stronger association. Large residuals would indicate the model significantly under- or over-predicts the actual data.

Following are some example scatter plots with the line of best fit and residuals, and their corresponding residual plots.

Scatter plot and residual plot of weak positive linear association:

16
17
18
19
20
21
22
23
24
x
45
50
55
60
65
70
75
80
y
Scatter plot with residuals
16
17
18
19
20
21
22
23
24
x
-8
-4
4
8
12
\text{Residual } y
Residual plot

From the scatterplot we can see the association is positive. The residual plot has no obvious pattern, suggesting a linear model is appropriate. The residuals are relatively large indicating a weak relationship.

Scatter plot and residual plot of strong negative linear association with an outlier:

0.1
0.2
0.3
0.4
0.5
0.6
0.7
x
1
2
3
4
5
6
7
8
9
10
y
Scatter plot with residuals
0.1
0.2
0.3
0.4
0.5
0.6
0.7
x
-1
1
2
3
\text{Residual } y
Residual plot

From the scatterplot we can see the association is negative. Other than the outlier, the residuals are relatively small, indicating a strong relationship. The outlier in the scatterplot stands out in the residual plot. Its inclusion leads to most of the data points being over-predicted by the best fit line.

Scatter plot and residual plot of non-linear association:

1
2
3
4
5
6
7
8
9
x
5
10
15
20
25
y
Scatter plot with residuals
1
2
3
4
5
6
7
8
9
x
-4
-3
-2
-1
1
2
3
4
\text{Residual } y
Residual plot

The residual plot displays a clear pattern, indicating that a linear model is not appropriate for this data set.

Examples

Example 8

The scatter plot shows the relationship between the electricity usage of a household and the cost of their monthly utility bill.

Cost of energy usage
200
400
600
800
1000
1200
1400
\text{Energy Usage (in kWh)}
25
50
75
100
125
150
175
200
225
250
275
\text{Amount (in dollars)}

The equation of the line of best fit is y=0.255x-81.49

The residual plot of the data is shown:

Residual plot
200
400
600
800
1000
1200
1400
x
-20
-10
10
20
\text{Residual }y
a

Interpret the strength and linear association of the data using the line of best fit and residual plot.

Worked Solution
Apply the idea

The association between the data is a strong positive linear association.

Since the slope of the line of fit is positive, the correlation is also positive. The data points mostly appear close to the line with relatively small residuals so we will describe this correlation as a strong positive correlation.

The residual plot has no obvious pattern suggesting a linear model is appropriate for the data.

Reflect and check

The points on the residual plot are not close to the x-axis like the example of the strong linear association in the lesson. However, the residuals show that the predicted and actual values are mostly within \$10 of each other, which is relatively close for a prediction involving bills of up to \$250.

It is important to consider the size of the residuals relative to what you are predicting. This analysis leaves some room for interpretation, depending on how precise our predictions need to be.

b

Find and interpret the residual for the point \left(930, 150\right).

Worked Solution
Create a strategy

The residual for each point is its vertical distance from the line of fit.

Apply the idea

A residual can be calculated by finding the difference between the actual output of the data point and the output predicted by the line of fit. This can be written as the formula: \text{residual}=\text{actual}-\text{predicted}

The point \left(930, 150\right) tells us that an x-value of 930 has an actual y-value of 150. We can find the predicted value using the equation of the line of fit.

\displaystyle y\displaystyle =\displaystyle 0.255x-81.49Line of fit
\displaystyle =\displaystyle 0.255(930)-81.49Substitute x=930
\displaystyle =\displaystyle 237.15-81.49Evaluate the multiplication
\displaystyle =\displaystyle 155.66Evaluate the subtraction

Find the residual:

\displaystyle \text{residual}\displaystyle =\displaystyle \text{actual}-\text{predicted}Residual formula
\displaystyle =\displaystyle 150-155.66Substitute in the actual and predicted y-values
\displaystyle =\displaystyle -5.66Evaluate the subtraction

This means at a usage of 930 \text{ kWh} the trendline would have over-predicted the actual bill of \$150 by \$5.66.

Reflect and check

The sign of the residual tells us if the trendline over- or under-predicts the actual data:

  • A residual will be positive if the data lies above the line. Hence, the line under-predicts the value.

  • A residual will be negative if the data lies below the line. Hence, the line over-predicts the value.

We can visually check our answer by looking at the residual plot to see if the residual at x=930 is approximately -5,

Example 9

Consider the following data set and scatterplot with line of fit.

x1011131819212325282931
y12139877423-1-2
10
15
20
25
30
x
-5
5
10
15
y
a

Create a residual plot for the data.

Worked Solution
Create a strategy

The residual for each point is its vertical distance from the line of fit. We want to find this for each point, and plot it against the same x-axis scale.

Apply the idea

A residual is be calculated by finding the difference between the actual output of each data point and the output predicted by the line of fit.

For example, the point \left(10,12\right) tells us that an x-value of 10 has an actual y-value of 12. We can find the predicted value using the equation of the line of fit.

\displaystyle y\displaystyle =\displaystyle -0.653x+19.17Line of fit
\displaystyle =\displaystyle -0.653(10)+19.17Substitute x=10
\displaystyle =\displaystyle -6.53+19.17Evaluate the multiplication
\displaystyle =\displaystyle 12.64Evaluate the addition

Find the residual:

\displaystyle \text{residual}\displaystyle =\displaystyle \text{actual}-\text{predicted}Residual formula
\displaystyle =\displaystyle 12-12.64Substitute in the actual and predicted y-values
\displaystyle =\displaystyle -0.64Evaluate the subtraction

This residual value would be located at the point (10, -0.64). We can repeat this process for the remainder of x-values in the table to determine their residual points for the residual plot.

Residual plot
10
15
20
25
30
x
-6
-4
-2
2
4
6
\text{Residual }y

A rough sketch of the residual plot can be created by estimating the vertical distance between each data point and the line of fit.

b

Determine if a linear model is an appropriate choice for the data.

Worked Solution
Create a strategy

The residual plot can be used to determine if a linear model is an appropriate choice for the data.

Apply the idea

Since the data points are randomly dispersed around the x-axis on the residual plot of the data, a linear model appears to be appropriate for the data.

Idea summary

A residual plot shows the strength of the correlation between two variables. The closer the data points on a residual plot are to the x-axis, the stronger the correlation between the data. A model is considered strong when the residuals are small relative to the value being predicted. Calculate the residuals for a residual plot using the formula:\text{residual}=\text{actual}-\text{predicted}

In general, a residual plot with points randomly dispersed about the x-axis indicates that the model is appropriate for the data.

Outcomes

2.6.A

Construct linear, quadratic, and exponential models based on a data set.

2.6.B

Validate a model constructed from a data set.

What is Mathspace

About Mathspace