topic badge

5.08 Bivariate data with technology

Worksheet
Bivariate data with technology
1

Use technology to find the line of best fit for the sets of data below. Write the equation with the coefficient and constant term to the nearest two decimal places.

a
x24371931322214302340
y-7-8-3-6-9-8-2-8-8-12
b
x44394150455548544443
y2.291.573.863.144.434.863.864.714.294.14
2

For each of the following sets of data:

i

Use technology to calculate the correlation coefficient. Round your answer to two decimal places.

ii

Describe the correlation between the the two variables in terms of strength, direction and form.

a
x36912151821
y-7-7.35-7.77-7.56-7.63-8.05-7.28
b
x9111315171921
y-4-4.5-4.55-4.6-4.65-4.7-4.75
c
x3456789
y77.47.887.647.728.27.32
d
x267141722
y-0.2-0.9-0.6-2.0-2.4-2.0
e
x459131721
y-0.2-0.7-0.4-1.9-2.4-0.9
f
x369141422
y01.641.253.743.820.22
3

Noah is a coffee vendor. He records the maximum temperature of the day and the number of coffees sold. The results are recorded in the following table:

\text{Maximum Temperature (\degree{C})}283231333126252935
\text{Number of coffees}17372539237193442
a

Construct a scatterplot for the data.

b

Use technology to calculate the correlation coefficient. Round your answer to two decimal places.

c

Hence, describe what happens to the sales of coffee as the temperature increases.

4

Consider the following set of data:

x15.713.116.11118.615.812.712.814.316.8
y28.328.828.42927.928.428.52928.528.6
a

Using technology, calculate the correlation coefficient between these scores. Round your answer to two decimal places.

b

Describe the correlation between these two variables.

c

Form an equation for the line of best fit. Round all values to one decimal place.

5

Consider the following set of data:

a

Using technology, calculate the correlation coefficient between these scores. Round your answer to two decimal places.

b

Describe the correlation between these two variables.

c

Form an equation for the line of best fit. Round all values to one decimal place.

xy
48.50166.10
24.60156.30
38.5063.40
39.70143.90
42.20142.20
29.60148.70
40.5090.90
42.60174.00
45.80149.50
23.80-52.30
47.60195.50
20.8065.50
6

The age and price of various second-hand Mitsubishi Lancers are shown in the adjacent table:

a

Using technology, find the equation of the line of best fit for the price, y, in terms of the age, x. Round all values to the nearest integer.

b

State the value of the vertical intercept.

c

What is the price of a Mitsubishi Lancer when it is brand new?

d

State the gradient of the line.

e

What does the gradient of the line indicate about the correlation between age and price?

f

What happens to the price of the car if its age increases by 1 year?

\text{Age}\text{Price } (\$)
116\,000
213\,000
021\,990
510\,000
78600
412\,500
311\,000
411\,000
84500
214\,500
Predictions
7

Research on the number of cigarettes smoked during pregnancy and the birth weights of the newborn babies was conducted and results displayed in the table below:

a

Calculate the correlation coefficient for the data. Round your answer to three decimal places.

b

Is the correlation positive or negative?

c

Is the correlation strong, moderate or weak?

d

Use technology to find the equation of the line of best fit of y on x. Round all values to two decimal places where necesssary.

e

Use your line of best fit to predict the birth weight of a newborn whose mother smoked on average 5 cigarettes per day.

f

Is the prediction from part (e) an example of interpolation or extrapolation?

g

Is the prediction from part (e) reliable?

\text{Average number of} \\ \text{ cigarettes per day } (x)\text{Birth weight} \\ \text{ in kilograms } (y)
46.303.90
13.005.80
21.405.00
25.004.80
8.605.50
36.504.50
1.007.00
17.905.10
10.605.50
13.405.10
37.303.80
18.505.70
8

The results (as percentages) for a practice spelling test and the real spelling test were collected for 8 students:

\text{Practice } (x)56.3079.0059.4077.0071.6064.4061.6068.20
\text{Real } (y)48.9069.3051.9066.0062.5056.1054.5059.20
a

Calculate the correlation coefficient for the scores. Round your answer to three decimal places.

b

Is the correlation positive or negative?

c

Is the correlation strong, moderate or weak?

d

Using technology, find the equation for the line of best fit of y on x. Round all values to two decimal places.

e

Use your line of best fit to predict the real spelling test result of a student who scored 60\% in their practice spelling test. Round the answer to two decimal places.

f

Is the prediction from part (e) an example of interpolation or extrapolation?

g

Is the prediction from part (e) reliable?

9

The average number of pages read to a child each day and the child’s growing vocabulary are measured and the results are recorded in the table below:

Pages read per day252729313311829295
Total vocabulary40244046776220487295457460106
a

Find the equation of the line of best fit for the total vocabulary, V, in terms of pages read per day, P. Round all values to one decimal place.

b

State the value of the vertical intercept.

c

What is the child's vocabulary if the child doesn't read any pages each day?

d

State the gradient of the line.

e

How many words will the child's vocabulary increase for each additional page that a child reads each day?

f

Use the equation of the line of best fit to predict the total vocabulary of a child that reads 30 pages per day.

10

Concern over student use of the social media app SnappyChatty leads to a study of student marks in Mathematics versus minutes spent using the app. The results are shown in the table below:

\text{Minutes, } M29215335425311421957162254
\text{Mark, }P\%26631337978951985936
a

Find the equation of the line of best fit for the mark as a percentage, P, in terms of minutes spent using SnappyChatty, M. Round all values to one decimal place.

b

State the value of the vertical intercept.

c

Predict the mark of a student that spends no time using the app.

d

Is this a reasonable prediction? Explain your answer.

e

State the gradient of the line.

f

Describe what will happen to the student's test mark for each additional minute that the student uses SnappyChatty.

11

A sample of families were interviewed about their annual family income, x, and their average monthly expenditure, y. Results are displayed in the table below:

a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between the two variables.

c

Use technology to find the equation for the line of best fit of y on x. Round all values to three decimal places.

d

Use your line of best fit to predict the monthly expenditure for a family whose annual income is \$80\,000.

e

Which of the following annual incomes would give the more reliable prediction of average monthly expenditure? \$99\,000, \$51\,000 \text{ or } \$80\,000Explain your answer.

\text{Annual income } \\ (x)\text{Average monthly} \\ \text{ expenditure } (y)
66\,0001100
75\,0001700
65\,0001400
73\,0001300
54\,000600
90\,0001800
87\,0001100
87\,0001500
94\,0001800
96\,0002200
12

The forecast maximum temperature, in degrees Celsius, and the observed maximum temperature are recorded to determine the accuracy in the temperature prediction models used by the weather bureau.

a

Calculate the correlation coefficient for these temperatures. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use technology to find the equation for the line of best fit of y on x.

d

Use your line of best fit to predict the observed maximum temperature on a day in the same month when the forecast was 25 \degree \text{C}.

e

Which of the following forecast temperatures would give the most reliable predicted temperature?39 \degree \text{C}, 25 \degree \text{C} \text{ or } 32 \degree \text{C}Explain your answer.

\text{Forecast } (x)\text{Observed } (y)
30.1031.20
27.2027.80
26.9024.30
29.0027.60
30.2031.60
34.2031.50
33.2033.70
36.9034.30
27.1025.00
30.1031.90
13

During an alcohol education programme, 10 adults were offered up to 6 drinks and were then given a simulated driving test where they scored a result out of a possible 100. The results are displayed in the following table:

\text{Number of drinks } (x)3264416342
\text{Driving score } (y)66614358567331645562
a

Calculate the correlation coefficient for the data. Round your answer to two decimal places.

b

Describe the statistical relationship between the two variables.

c

Use technology to find the equation of the line of best fit of y on x. Round all values to one decimal place.

d

Use your line of best fit to predict the driving score of a young adult who consumed 5 drinks. Round your answer to one decimal place.

e

Comment on the validity of your prediction.

14

The estimated iron ore grades (percentage of iron content) and the actual recovered iron ore grades are recorded to determine the accuracy of the models used by a mining company. The percentages are recorded in the table below:

\text{Estimated }(x)38\%32\%31\%33\%38\%40\%
\text{Actual }(y)48\%29\%42\%32\%40\%50\%
a

Calculate the correlation coefficient between the two variables, rounded to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use technology to form an equation for the line of best fit of y \text{ on } x. Round all values to two decimal places.

d

Use your line of best fit to predict the actual iron ore grade from a pit where the estimated grade is 35\%. Round your answer to two decimal places.

e

For which of the following estimated grades would the line produce the least reliable prediction of the actual grade?

A

Estimate of 32\%

B

Estimate of 29\%

C

Estimate of 45\%

15

A team of salespeople submit their expenses and their sales for the month of March to their manager. Figures are recorded in the table:

a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use technology to form an equation for the line of best fit of y \text{ on } x. Round all values to two decimal places.

d

Use your line of best fit to predict the expenses of a person in this department who made sales of \$6000 for the month.

e

Explain the validity of this prediction.

\text{Sales (hundreds) }(x)\text{Expenses }(y)
55.743.8
4828.9
62.773.9
23.411.9
23.89
60.650
20.220.2
52.128.2
23.419.2
33.331.1
16

Research has been conducted to deduce whether there is any relationship between the number of pirated downloads as a percentage of total downloads, x, and the incidences of malware infection in computers, y.

x40.561.177.434.665.363.85666.534.320.168.750.8
y3.77.116.710.65.215.812.712.81.50.98.512.9
a

Calculate the correlation coefficient between the two variables. Round your answer to two decimal places.

b

Describe the statistical relationship between these two variables.

c

Use technology to form an equation for the line of best fit of y \text{ on } x. Round all values to two decimal places.

d

Use your line of best fit to predict the number of computers in 1000 that you would expect to be affected by malware, if in that region 80\% of downloads were pirated.

e

Describe the reliability of this prediction.

17

The data in the table and scatter plot below show the frequencies per month of online marketing emails sent out to subscribers, compared with the proportion of subscribers who click and open the email:

2
4
6
8
10
x
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
y
\text{Frequency } (x)\text{Proportion} \\ \text{click and open } (y)
30.41
40.59
11.28
50.47
40.26
70.62
100.75
a

Calculate the correlation coefficient between these two variables. Round answer to two decimal places.

b

Which piece of data appears to be an outlier?

c

Remove the outlier and recalculate the correlation coefficient.

d

Describe the statistical relationship between the two variables.

e

Use technology to find the equation for the line of best fit of y on x (with the outlier removed). Round values to two decimal places.

f

Use your line of best fit to predict the proportion of emails opened if they're sent 20 times a month.

g

Comment on the validity of your prediction.

Sign up to access Worksheet
Get full access to our content with a Mathspace account

Outcomes

4.1.3.3

use technology to find the line of best fit [complex]

4.1.3.5

use technology to find the correlation coefficient (an indicator of the strength of linear association) [complex]

What is Mathspace

About Mathspace