7. Probability & Statistics

Bivariate data can be modeled with a **curve of best fit**, also called a **regression model**. If the correlation between the variables is linear, a line of best fit can be used to model the data. If the correlation is not linear, the data may be modeled by a curve, such as an exponential or quadratic model.

When choosing a model to represent data and make predictions, we can use what we know about the key features of the functions we have learned so far and match them to the behavior of the data.

Each table shown represents a different set of data.

x | 0 | 0.2 | 0.4 | 0.6 | 0.8 | 1 | 1.2 | 1.4 | 1.6 | 1.8 |
---|---|---|---|---|---|---|---|---|---|---|

y | 13 | 7 | 4 | 3 | 1 | 0 | 2 | 3 | 6 | 9 |

x | 3 | 3.5 | 4 | 4.5 | 5 | 5.5 | 6 | 6.5 | 7 | 7.5 | 8 |
---|---|---|---|---|---|---|---|---|---|---|---|

y | 63 | 68 | 77 | 90 | 104 | 100 | 112 | 120 | 114 | 127 | 127 |

x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|

y | 4 | 5 | 5 | 6 | 7 | 8 | 16 | 30 | 37 | 66 | 92 |

Without creating a scatterplot:

What type of function would best model the data in Table 1? Explain your answer.

What type of function would best model the data in Table 2? Explain your answer.

What type of function would best model the data in Table 3? Explain your answer.

The correlation coefficient is used to determine the strength of a linear relationship, but it cannot describe the strength of nonlinear relationships. Instead, we can analyze the **coefficient of determination (R^{2}).**The value of R^{2} can vary between 0 and 1. The closer the value is to 1, the more accurate the model is.

The population of fish in a small lake over time, given in years, is shown in the table:

Years | Fish Population | Years | Fish Population | |
---|---|---|---|---|

0 | 1000 | 2 | 290 | |

0.25 | 820 | 2.25 | 210 | |

0.5 | 650 | 2.5 | 160 | |

0.75 | 665 | 2.75 | 145 | |

1 | 500 | 3 | 120 | |

1.25 | 490 | 3.25 | 120 | |

1.5 | 425 | 3.5 | 100 | |

1.75 | 350 | 3.75 | 100 |

a

Determine whether a linear or exponential model best fits the relationship between the years and the population of fish.

Worked Solution

b

Calculate the regression model for the data and use it to predict the population of fish in the lake after 4 years.

Worked Solution

Jocelyn plays on the basketball team and wants to improve her shot. She notices that she is really good at making shots from certain distances, but she is not as good at making shots from other distances. She decides to investigate this further using the data cycle.

a

Formulate a statistical question that Jocelyn can use to investigate the relationship between the shots she makes and the distance she is from the hoop.

Worked Solution

b

Describe a method Jocelyn can use to collect the data.

Worked Solution

c

Jocelyn collected data on the percentage of shots she made from various distances. Her data is shown in the table.

\text{Distance (ft)} | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|

\text{Shots made }(\%) | 33 | 36 | 47 | 56 | 69 | 71 | 81 | 88 | 88 | 91 |

\text{Distance (ft)} | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |

\text{Shots made }(\%) | 93 | 88 | 84 | 81 | 70 | 68 | 57 | 49 | 39 |

Organize the data into a scatterplot.

Worked Solution

d

Determine the type of model that fits the data best, and find the equation of the curve of best fit.

Worked Solution

e

Use the data to answer the statistical question from part (a).

Worked Solution

Idea summary

Bivariate data can be modeled by curves of best fit such as linear functions, quadratic functions, or exponential functions. We can analyze how the data changes over the domain to choose the function that is most appropriate.

The correlation coefficient, r, determines of well a linear model fits the data. R^{2} determines how strong or weak the correlation of a nonlinear model is. An R^{2} value closer to 1 represents a stronger fit.

Different functions can be used to model different situations. Sometimes, it takes a combination of functions to see the full picture. A curve of best fit may only be appropriate over a part of the domain, and we can create a piecewise function to model the data over different intervals of the domain.

Loren posted a video on his social media account and noticed that the video was gaining lots of views. He collected data on the number of views the video had at the end of each day and organized it into a scatterplot, as shown.

Days | Views | Days | Views | |
---|---|---|---|---|

1 | 100 | 11 | 16526 | |

2 | 132 | 12 | 17538 | |

3 | 212 | 13 | 19192 | |

4 | 485 | 14 | 20537 | |

5 | 1226 | 15 | 23264 | |

6 | 2779 | 16 | 23831 | |

7 | 6838 | 17 | 24809 | |

8 | 12000 | 18 | 27249 | |

9 | 13393 | 19 | 29728 | |

10 | 14262 | 20 | 29942 |

a

Is the relationship between the days and the number of views best approximated by a line, quadratic curve, exponential curve, or a combination of these functions? Explain.

Worked Solution

b

Use your answer from part (a) to find the regression model for the data.

Worked Solution

Idea summary

Piecewise functions can represent data that have different characteristics at different intervals. To find the curves of best fit in a piecewise function, we must first create a scatterplot and analyze the data. Then, we will need to determine an appropriate domain for each function type.