topic badge

5.05 Making predictions using time series data

Lesson

Predictions

The purpose of smoothing time series data using a moving average or deseasonalising the data is to take out the 'peaks' and 'troughs' and view the underlying trend. Often, but not always, the smoothed data will appear to be linear in nature. If it is linear in nature we can calculate the  least-squares regression line  for the smoothed data and use this line to make future predictions. Making future predictions with time series data is also called forecasting.

Once we have a predicted value from the underlying trend line, we will need to factor back in the season it's from. If it's from a 'peak' season then our prediction should be adjusted upwards. If it's from a 'trough' season then it should be adjusted downwards. We multiply the predicted score from the regression line by the appropriate  seasonal index  to adjust the predicted value.

Steps to predict from time series data:

  1. Smooth the data using the appropriate moving average or by deseasonalising the data.

  2. Give each time period a number eg: Mon Week 1 becomes t=1, Tues Week 1 becomes t=2 etc.

  3. Calculate the equation of the least-squares regression line using the time period in list 1 and the smoothed data in list 2 of your calculator. Round numbers to four decimal places unless instructed otherwise.

  4. Substitute a future time value into the equation of the least-squares regression line to obtain a predicted score.

  5. Multiply the predicted score by the appropriate seasonal index to factor back in the seasonality of the data.

  6. If asked to comment on the reliability of the prediction, consider whether the future time value is close to the data. Generally predicting within one cycle of the existing data is considered reliable.

Note:

  • Moving average data or deseasonalised data can be used to find the regression line. Read the question carefully.

  • Correlation has little meaning with time series data as the smoothed line will almost always have a strong, linear correlation for an underlying linear trend.

  • Time series predictions are almost always future predictions so we don't tend to use the words 'extrapolation' or 'interpolation' with these problems. Instead, we consider how close the predicted value is to the original data set and use the words 'within one cycle'.

  • Showing working is essential in these type of problems. You should always state the value of t that you are using, the equation of the regression line and the seasonal index you are multiplying with.

  • Time is always the explanatory variable (plotted on the horizontal axis).

When using time series data our prediction will almost always be an extrapolation. As such, we know our predictions might be somewhat unreliable due to our inability to accurately predict future events. However, just as when making predictions in Chapter 2, we will generally consider a prediction made close to the existing data range to be reliable. For the purpose of this course, we will view predictions that have been made within one cycle of the available data to be close and hence, reliable.

Examples

Example 1

The following data shows the sales of air conditioners at a leading retailer over four quarters of three consecutive years.

\text{Month}\text{Time }(t)\text{Number of}\\ \text{ air conditioners sold}\text{Proportion}\\ \text{ of yearly mean}
Year 1 \text{March}110420.8529
\text{June}24860.3978
\text{Sept}36130.5017
\text{Dec}427462.2476
Year 2\text{March}511600.8183
\text{June}66090.4296
\text{Sept}711390.8035
\text{Dec}827621.9485
Year 3\text{March}917950.9638
\text{June}1011810.6341
\text{Sept}1110940.5874
\text{Dec}1233801.8148
a

Calculate the seasonal component for the quarters ending in March, June, September and December, rounding to four decimal places if necessary.

MarchJuneSeptemberDecember
\text{ }
Worked Solution
Create a strategy

Average the proportion of yearly mean values for each month.

Apply the idea
\displaystyle \text{March}\displaystyle =\displaystyle \dfrac{0.8529+0.8183+0.9638}{3}Average the proportion of yearly means for March
\displaystyle =\displaystyle 0.8783Evaluate
\displaystyle \text{June}\displaystyle =\displaystyle \dfrac{0.3978+0.4296+0.6341}{3}Average the proportion of yearly means for June
\displaystyle =\displaystyle 0.4872Evaluate
\displaystyle \text{September}\displaystyle =\displaystyle \dfrac{0.5017+0.8035+0.5874}{3}Average the proportion of yearly means for September
\displaystyle =\displaystyle 0.6309Evaluate
\displaystyle \text{December}\displaystyle =\displaystyle \dfrac{2.2476+1.9485+1.8148}{3}Average the proportion of yearly means for December
\displaystyle =\displaystyle 2.0036Evaluate
MarchJuneSeptemberDecember
0.87830.48720.63092.0036
b

The data is smoothed using a 4 point centred moving average as shown in the table below. Calculate the missing values.

\text{Month}\text{Time }(t)\text{Number of}\\ \text{ air conditioners sold}\text{4CMA}
Year 1 \text{March}11042
\text{June}2486
\text{Sept}36131236.5
\text{Dec}427461266.625
Year 2\text{March}511601347.75
\text{June}6609A
\text{Sept}711391496.875
\text{Dec}827621647.75
Year 3\text{March}917951713.625
\text{June}101181B
\text{Sept}111094
\text{Dec}123380
Worked Solution
Create a strategy

Use formula: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}.

Apply the idea

A is the 4CMA for time period 6, so our data values should come from time periods 4 to 8.

\displaystyle A\displaystyle =\displaystyle \dfrac{0.5\times 2746+1160+609+1139+ 0.5\times 2762}{4}Substitute the 5 data values
\displaystyle =\displaystyle 1415.5Evaluate

B is the 4CMA for time period 10, so our data values should come from time periods 8 to 12.

\displaystyle B\displaystyle =\displaystyle \dfrac{0.5\times 2762+1795+1181+1094+ 0.5\times 3380}{4}Substitute the 5 data values
\displaystyle =\displaystyle 1785.25Evaluate
c

Use your calculator to calculate the equation of the least squares regression line that fits the 4CMA data. Give the equation of the line in the form y=at + b. Round a and b to four decimal places.

Worked Solution
Create a strategy

Enter the t-values in the first list or column, and the 4CMA values in the second list or column.

Apply the idea

Using your calculator you should get the following equation:y=84.0193t + 942.6086

d

Predict the number of air conditioners sold in the quarter ending December year 4. Round your answer to the nearest whole air conditioner sold.

Worked Solution
Create a strategy

Use the regression line equation from part (c) and then multiply it by the December seasonal index from part (a) to reverse the effect of deseasonalisation by the 4CMA smoothing.

Apply the idea

For Decempber in year 4, the t-value would be t=12+4=16. So we can substitute this value in the equation from part (c).

\displaystyle y\displaystyle =\displaystyle 84.0193 t + 942.6086Write the equation
\displaystyle =\displaystyle 84.0193 \times 16 + 942.6086Substitute t=16
\displaystyle =\displaystyle 2286.9174Evaluate

Now we need to multiply this value by the seasonal component for December from part (a) which was 2.0036.

\displaystyle \text{Air conditioners}\displaystyle =\displaystyle 2286.9174 \times 2.0036Multiply by the seasonal component
\displaystyle =\displaystyle 4582Evaluate and round
e

Comment on the reliability of your prediction.

A
Reliable due to the prediction being made within one cycle of the available data.
B
Unreliable due to the prediction being made beyond one cycle of the available data.
Worked Solution
Create a strategy

Determine whether the prediction made in part (d) is within or beyond one cycle of the data.

Apply the idea

December year 4 occurs only 1 year after the data that is recorded in the table. So it is within one cycle.

The correct option is A.

Example 2

A new pop up ice-cream shop records their sales over their first month. The data is tabulated below. The shop is only open from Friday to Sunday.

\text{Day}\text{Time }(t)\text{Sales (dollars)}\text{Deseasonalised data}
Week 1\text{Fri}120362101.14
\text{Sat}222572040.87
\text{Sun}319362092.75
Week 2\text{Fri}42224X
\text{Sat}525472303.10
\text{Sun}620602226.79
Week 3\text{Fri}723492424.15
\text{Sat}827062446.88
\text{Sun}9Y2431.09
Week 4\text{Fri}1024352512.90
\text{Sat}1128242553.58
\text{Sun}1223982592.15

Seasonal indices:

FriSatSun
0.96901.10590.9251
a

On which day will shop be most likely to need extra help?

A
Saturday
B
Sunday
C
Friday
Worked Solution
Create a strategy

Choose the day with the highest sales every week and highest seasonal components.

Apply the idea

Saturday has the highest seasonal index and consistently has the highest sales each week. So that is when the shop is the busiest and will need extra help.

The answer is option A.

b

Calculate the value of X in the table. Round the value off to two decimal places if necessary.

Worked Solution
Create a strategy

Use the formula : \text{\text{Deseasonalised data}}=\dfrac{\text{Raw value}}{\text{Seasonal index}}

Apply the idea

X is the deseasonalised data for Friday Week 2. This day had sales of \$2224 and the seasonal index for Friday is 0.9690.

\displaystyle X\displaystyle =\displaystyle \dfrac{2224}{0.9690}Substitute the values
\displaystyle =\displaystyle 2295.15Evaluate and round
c

Calculate the value of Y in the table. Round the value off to a single decimal place if necessary.

Worked Solution
Create a strategy

Multiply the deseasonalised data by the seasonal index.

Apply the idea

Y is the sales for Sunday Week 3. This day had deseasonalised data of \$2431.09 and the seasonal index for Sunday is 0.9251.

\displaystyle Y\displaystyle =\displaystyle 2431.09 \times 0.9251Multiply 2431.09 by 0.9251
\displaystyle =\displaystyle \$2249Evaluate
d

Using your calculator, determine the equation of least squares regression line for the deseasonalised data, where t=1 is Friday of Week 1.

Give the equation of the line in the form y=at + b. Round a and b to four decimal places. You can make use of a and b in your working.

Worked Solution
Create a strategy

Enter the t values in the first list or column of your calculator, and the deseasonalised values in the second list or column.

Apply the idea

Using your calculator you should get the following equation:y=49.88t + 2010.84

e

Predict the sales for Friday of the sixth week. Give your answer in dollars and round off any figures to two decimal places if needed.

Worked Solution
Create a strategy

Use the regression line equation in part (d) and then multiply it by the Friday seasonal index to reverse the effect of deseasonalisation.

Apply the idea

Each week the value of t for Friday increases by 3. Friday of the sixth week, which is two weeks later, will have t=10+3+3=16. The seasonal index for Friday is 0.9690.

\displaystyle y\displaystyle =\displaystyle 49.88t + 2010.84Write the equation
\displaystyle y\displaystyle =\displaystyle 49.88\times 16 + 2010.84Substitute t=16
\displaystyle =\displaystyle 2808.92Evaluate
\displaystyle \text{Sales}\displaystyle =\displaystyle 2808.92\times 0.9690Multiply by the seasonal index for Friday
\displaystyle =\displaystyle \$2721.84Evaluate
f

Will the Friday in the sixth week be within one seasonal cycle of the data?

A
Reliable due to the prediction being made within one cycle of the available data.
B
Unreliable due to the prediction being made beyond one cycle of the available data.
Worked Solution
Create a strategy

Determine whether the prediction made in part (e) is within or beyond one cycle of the data.

Apply the idea

Friday of the sixth week occurs two weeks after the data that is recorded in the table. One cycle is one week, so two weeks are beyond one cycle of the data.

The answer option is B.

Idea summary

Steps to predict from time series data:

  1. Smooth the data using the appropriate moving average or by deseasonalising the data.

  2. Give each time period a number.

  3. Calculate the equation of the least-squares regression line.

  4. Substitute a future time value into the equation of the least-squares regression line to predict score.

  5. Multiply the predicted score by the appropriate seasonal index to factor back in the seasonality of the data.

  6. If asked to comment on the reliability of the prediction, consider whether the future time value is close to the data.

Outcomes

ACMGM092

fit a least-squares line to model long-term trends in time series data

What is Mathspace

About Mathspace