topic badge

5.05 Making predictions using time series data

Lesson

The purpose of smoothing time series data using a moving average or deseasonalising the data is to take out the 'peaks' and 'troughs' and view the underlying trend. Often, but not always, the smoothed data will appear to be linear in nature. If it is linear in nature we can calculate the least-squares regression line for the smoothed data and use this line to make future predictions. Making future predictions with time series data is also called forecasting.

Once we have a predicted value from the underlying trend line, we will need to factor back in the season it's from. If it's from a 'peak' season then our prediction should be adjusted upwards. If it's from a 'trough' season then it should be adjusted downwards. We multiply the predicted score from the regression line by the appropriate seasonal index to adjust the predicted value.

Predicting from time series data
Step Action
1. Smooth the data using the appropriate moving average or by deseasonalising the data.
2. Give each time period a number eg: Mon Week 1 becomes $t=1$t=1, Tues Week 1 becomes $t=2$t=2 etc.
3. Calculate the equation of the least-squares regression line using the time period in list 1 and the smoothed data in list 2 of your calculator. Round numbers to four decimal places unless instructed otherwise.
4. Substitute a future time value into the equation of the least-squares regression line to obtain a predicted score.
5. Multiply the predicted score by the appropriate seasonal index to factor back in the seasonality of the data.
6. If asked to comment on the reliability of the prediction, consider whether the future time value is close to the data. Generally predicting within one cycle of the existing data is considered reliable.

Note:

  • Moving average data or deseasonalised data can be used to find the regression line. Read the question carefully!
  • Correlation has little meaning with time series data as the smoothed line will almost always have a strong, linear correlation for an underlying linear trend.
  • Time series predictions are almost always future predictions so we don't tend to use the words 'extrapolation' or 'interpolation' with these problems. Instead, we consider how close the predicted value is to the original data set and use the words 'within one cycle'.
  • Showing working is essential in these type of problems. You should always state the value of $t$t that you are using, the equation of the regression line and the seasonal index you are multiplying with.
  • Time is always the explanatory variable (plotted on the horizontal axis).

 

Worked example

Example 1

An aluminium window company records quarterly figures on the number of windows manufactured. For planning purposes the company management wishes to predict the number of windows that will be manufactured in March 2020.

Month/Yr Time ($t$t) Raw data Proportion of yearly mean Deseasonalised data
Mar 2018 $1$1 $471$471 $0.9598$0.9598 $505.81$505.81
Jun 2018 $2$2 $480$480 $0.9781$0.9781 $485.01$485.01
Sep 2018 $3$3 $492$492 $1.0025$1.0025 $485.43$485.43
Dec 2017 $4$4 $520$520 $1.0596$1.0596 $487.97$487.97
Mar 2018 $5$5 $427$427 $0.9143$0.9143 $458.56$458.56
Jun 2018 $6$6 $463$463 $0.9914$0.9914 $467.84$467.84
Sep 2018 $7$7 $484$484 $1.0364$1.0364 $477.54$477.54
Dec 2018 $8$8 $494$494 $1.0578$1.0578 $463.57$463.57
Mar 2019 $9$9 $425$425 $0.9190$0.9190 $456.41$456.41
Jun 2019 $10$10 $462$462 $0.9995$0.9995 $466.83$466.83
Sep 2019 $11$11 $463$463 $1.0016$1.0016 $456.82$456.82
Dec 2019 $12$12 $499$499 $1.0795$1.0795 $468.26$468.26

Think: Look at the table carefully. We can see that steps 1 and 2 have been completed already.

  • Step 1–the data has been smoothed. The deseasonalised data can be fitted to a linear regression model.

  • Step 2–the time periods have been allocated a number ( $t=1$t=1, $t=2$t=2, $t=3$t=3 etc).

Do: Step 3–calculate the least-squares regression line. Using our calculator, we enter the time values in list 1 and our deseasonalised data in list 2, then we fit a linear regression model. Then to complete the remaining steps, we can use the linear regression model to make a prediction for March 2020, and then adjust this value to include seasonality. Review how to fit a least squares regression line using your brand of calculator here.

Write down the equation $y=-3.2519t+494.4746$y=3.2519t+494.4746 as the equation of the least squares regression line. To predict for March 2020 we first need to find the $t$t value. March 2019 was $t=9$t=9 so one cycle or four more time periods ahead means March 2020 will be $t=13$t=13. We now substitute that value into our regression line to make our deseasonalised prediction.

$y$y $=$= $-3.2519t+494.4746$3.2519t+494.4746

Writing down the least-squares equation

  $=$= $-3.2519\times13+494.4746$3.2519×13+494.4746

Substitute $t=13$t=13

  $=$= $452.1999$452.1999 

Simplify

 

Now we want to adjust for seasonality. We've just found the deseasonalised value for March 2020 which is the predicted value taken from the deseasonalised values. What we're interested in is the more realistic value that includes the seasonal trend in our actual data. So we must multiply our predicted value by the appropriate seasonal index. In this case we want to multiply by the seasonal index for March.

The seasonal indices for this data are given in the table below.

March June September December
$0.9310$0.9310 $0.9897$0.9897 $1.0135$1.0135 $1.0656$1.0656

So the predicted value for March 2020 is $452.1999\times0.9310=420.9981$452.1999×0.9310=420.9981. In other words, the window company predicts that $421$421 windows will be manufactured in March 2020.

Reflect: The original data as shown below contains peaks and troughs that reflect the seasonality. We use the deseasonalised values to fit a least squares line, which is then used to to make a deseasonalised prediction. To include seasonality, this prediction is adjusted using the seasonal index.

 

How reliable is our prediction?

When using time series data our prediction will almost always be an extrapolation. As such, we know our predictions might be somewhat unreliable due to our inability to accurately predict future events. However, just as when making predictions in Chapter 2, we will generally consider a prediction made close to the existing data range to be reliable. For the purpose of this course, we will view predictions that have been made within one cycle of the available data to be close and hence, reliable.

In the example above, one cycle was four quarters, and our last available piece of data was December 2019. So any prediction made for any of the quarters in 2020 would be considered reliable as they are within one cycle since December 2019. If we make predictions for 2021 and beyond, we will consider these forecasting well beyond the data range and hence, unreliable.

 

Practice questions

Question 1

The following data shows the sales of air conditioners at a leading retailer over four quarters of three consecutive years.

Time period Time ($t$t) Number of air conditioners sold Proportion of yearly mean
March year $1$1 $1$1 $1042$1042 $0.8529$0.8529
June year $1$1 $2$2 $486$486 $0.3978$0.3978
Sept year $1$1 $3$3 $613$613 $0.5017$0.5017
Dec year $1$1 $4$4 $2746$2746 $2.2476$2.2476
March year $2$2 $5$5 $1160$1160 $0.8183$0.8183
June year $2$2 $6$6 $609$609 $0.4296$0.4296
Sept year $2$2 $7$7 $1139$1139 $0.8035$0.8035
Dec year $2$2 $8$8 $2762$2762 $1.9485$1.9485
March year $3$3 $9$9 $1795$1795 $0.9638$0.9638
June year $3$3 $10$10 $1181$1181 $0.6341$0.6341
Sept year $3$3 $11$11 $1094$1094 $0.5874$0.5874
Dec year $3$3 $12$12 $3380$3380 $1.8148$1.8148
  1. Calculate the seasonal component for the quarters ending in March, June, September and December, rounding to four decimal places if necessary.

    March June September December
    $\editable{}$ $\editable{}$ $\editable{}$ $\editable{}$
  2. The data is smoothed using a $4$4 point centred moving average as shown in the table below. Calculate the missing values.

    Time period Time ($t$t) Number of air conditioners sold 4CMA
    March year $1$1 $1$1 $1042$1042  
    June year $1$1 $2$2 $486$486  
    Sept year $1$1 $3$3 $613$613 $1236.5$1236.5
    Dec year $1$1 $4$4 $2746$2746 $1266.625$1266.625
    March year $2$2 $5$5 $1160$1160 $1347.75$1347.75
    June year $2$2 $6$6 $609$609 $\editable{}$
    Sept year $2$2 $7$7 $1139$1139 $1496.875$1496.875
    Dec year $2$2 $8$8 $2762$2762 $1647.75$1647.75
    March year $3$3 $9$9 $1795$1795 $1713.625$1713.625
    June year $3$3 $10$10 $1181$1181 $\editable{}$
    Sept year $3$3 $11$11 $1094$1094  
    Dec year $3$3 $12$12 $3380$3380  
  3. Use your calculator to calculate the equation of the least squares regression line that fits the 4CMA data.

    Give the equation of the line in the form $y=at+b$y=at+b.

    Round $a$a and $b$b to four decimal places.

  4. Predict the number of air conditioners sold in the quarter ending December year $4$4.

    Round your answer to the nearest whole air conditioner sold.

  5. Comment on the reliability of your prediction.

    Reliable due to the prediction being made within one cycle of the available data.

    A

    Unreliable due to the prediction being made beyond one cycle of the available data.

    B

    Reliable due to the prediction being made within one cycle of the available data.

    A

    Unreliable due to the prediction being made beyond one cycle of the available data.

    B

Question 2

A new pop up ice-cream shop records their sales over their first month. The data is tabulated below.

Note that the shop is only open over the weekend.

Day Fri

Wk 1

Sat

Wk 1

Sun

Wk 1

Fri

Wk 2

Sat

Wk 2

Sun

Wk 2

Fri

Wk 3

Sat

Wk 3

Sun

Wk 3

Fri

Wk 4

Sat

Wk 4

Sun

Wk 4

Sales

(dollars)

$2036$2036 $2257$2257 $1936$1936 $2224$2224 $2547$2547 $2060$2060 $2349$2349 $2706$2706 $Y$Y $2435$2435 $2824$2824 $2398$2398
Deseasonalised

Data

$2101.14$2101.14 $2040.87$2040.87 $2092.75$2092.75 $X$X $2303.10$2303.10 $2226.79$2226.79 $2424.15$2424.15 $2446.88$2446.88 $2431.09$2431.09 $2512.90$2512.90 $2553.58$2553.58 $2592.15$2592.15
Seasonal Components:
Fri Sat Sun
$0.9690$0.9690 $1.1059$1.1059 $0.9251$0.9251

  1. On which day will shop be most likely to need extra help?

    Saturday

    A

    Sunday

    B

    Friday

    C

    Saturday

    A

    Sunday

    B

    Friday

    C
  2. Calculate the value of $X$X in the table.

    Round the value off to two decimal places if necessary.

  3. Calculate the value of $Y$Y in the table.

    Round the value off to a single decimal place if necessary.

  4. Using your calculator, determine the equation of least squares regression line for the deseasonalised data, where $t=1$t=1 is Friday of Week 1.

    Give the equation in the form $y=at+b$y=at+b and round off any figures of to two decimal places.

    You can make use of $a$a and $b$b in your working,

  5. Predict the sales for Friday of the sixth week.

    Give your answer in dollars and round off any figures to two decimal places if needed.

  6. Comment on the reliability of your prediction

    Reliable due to the prediction being made within one cycle of the available data

    A

    Unreliable due to the prediction being made beyond one cycle of the available data

    B

    Reliable due to the prediction being made within one cycle of the available data

    A

    Unreliable due to the prediction being made beyond one cycle of the available data

    B

Outcomes

ACMGM092

fit a least-squares line to model long-term trends in time series data

What is Mathspace

About Mathspace