topic badge

5.02 Moving averages

Lesson

Introduction

Time series data is a type of bivariate data, so it would be useful if we could use the data to make predictions using a linear regression model or line of best fit. However the peaks and troughs make it difficult to see the underlying trend in the data. In order to predict using a least-squares regression line we must smooth the data first.

In this course we have two methods of smoothing the data:

  • Moving averages

  • Deseasonalising using seasonal indices calculated with the average percentage method

Smoothing moving averages

When we say we want to smooth the data, we mean that we remove the peaks and troughs so that we are able to see the underlying trend of the data more clearly.

One method we can use to smooth the data is to calculate the moving averages of the response variable (y) and then plot this against time (t). A linear regression model can then be applied to the soothed set of data and used to analyse underlying trends.

As the name suggests, moving averages are averages of rolling sets of data. The moving average method calculates the average of a set number of points around the given data point and then the moving average is plotted against time. This lowers the peaks and raises the troughs, hence smoothing the original data.

It is important that we select the correct type of moving average based on the number of seasons in the time series cycle. If it is a 5 season cycle we use a 5 point moving average, if there are 3 seasons in a cycle we use a 3 point moving average. These are both examples of odd moving averages because we have an odd number of seasons in the cycle and to calculate the average we divide by an odd number.

If the correct moving average is calculated and graphed we will see a continually increasing, decreasing or stable underlying trend. If the incorrect moving average is calculated we will still have some peaks and troughs in the data.

Examples

Example 1

Consider the Time Series graph drawn below, along with two sets of moving averages.

A time series graph with legends: data, MA7, and MA5. Ask your teacher for more information.
a

Which moving average is most appropriate for this data?

A
5 point moving average
B
7 point moving average
Worked Solution
Create a strategy

Count how many points there are in a row before the graph repeats itself.

Apply the idea
1
2
3
4
5
6
7
8
\text{Time period}
8
10
12
14
16
\text{Data}

There are two large peaks at time periods 1 and 8, so we can count the number of points from peak to peak as shown in the graph.

We count 7 points, so we need a 7 point moving average. The correct answer is option B.

b

Which two is the reason why is 7 point moving average is the most appropriate?

A
It removes the most seasonality.
B
It best smooths the data.
C
It has the same number of points as the original graph.
Worked Solution
Create a strategy

The moving average should always match the number of seasons per cycle in the original data and smooth the data.

Apply the idea

There are more points than 7 in the original graph, so option C is incorrect.

The correct answers are options A and B.

Idea summary

The moving average used should always match the number of seasons per cycle in the original data.

Odd moving averages

Odd moving averages are used for cycles with an odd number of data points. They are named 3MA (3 point moving average), 5MA (5 point moving average), 7MA (7 point moving average), etc.

If we are calculating a 3MA we find the average of response variable for three data points (For example: t=1,\,t=2,\, and t=3) and the result will be plotted against the middle point (t=2). It's not possible to find a 3MA for t=1 as we don't have a data point below t=1.

If we are calculating a 5MA we find the average of the response variable for five data points (For example: t=1,\,t=2,\,t=3,\,t=4 and t=5) and the result will be plotted against the middle point (t=3). It's not possible to find a 5MA for t=1 or t=2 as we don't have enough points below them.

Examples

Example 2

Consider the time series data presented in the table:

\text{Time period}\text{Raw data}\text{3-moving average}\text{5-moving average}
1113.8
2109112.1
3113.4107.497.7
499.7ab
552.686.493.8
610785.391.6
796.3101.890.7
8c9889.5
995.681.488.2
1046.580.986.7
11100.578.6
12d
a

Calculate the value of a correct to one decimal place.

Worked Solution
Create a strategy

Find the mean of the data values for the previous, current and next time periods.

Apply the idea

a is a 3MA for time period 4, so we need to find the mean of the data values for time periods 3,\,4, and 5.

\displaystyle a\displaystyle =\displaystyle \dfrac{113.4+99.7+52.6}{3}Find the mean of the data values
\displaystyle =\displaystyle 88.6Evaluate
b

Calculate the value of b correct to one decimal place.

Worked Solution
Create a strategy

Find the mean of the data values for the 2 previous, current and 2 next time periods.

Apply the idea

b is a 5MA for time period 4, so we need to find the mean of the 5 data values for time periods 2,\, 3,\,4,\, 5, and 6.

\displaystyle b\displaystyle =\displaystyle \dfrac{109+113.4+99.7+52.6+107}{5}Find the mean of the data values
\displaystyle =\displaystyle 96.3Evaluate
c

Solve for the value of c in the table.

Worked Solution
Create a strategy

Use the 3MA for the same time period, as well as the data values before and after.

Apply the idea

The 3MA for time period 8 is 98. This will be equal to the mean of the data for time periods 7, \, 8, and 9.

\displaystyle \dfrac{96.3+c+95.6}{3}\displaystyle =\displaystyle 98Equatate the 3MA to the mean of the data values
\displaystyle \dfrac{191.9+c}{3}\displaystyle =\displaystyle 98Add 96.3 and 95.6
\displaystyle 191.9+c\displaystyle =\displaystyle 294Multiply both sides by 3
\displaystyle c\displaystyle =\displaystyle 102.1Subtract 191.9 from both sides
d

Calculate the value of d correct to one decimal place.

Worked Solution
Create a strategy

Use the 3MA for the previous time period, as well as the data values from the previous two time periods.

Apply the idea

The 3MA for time period 11 is 78.6. This will be equal to the mean of the data for time periods 10, \, 11, and 12.

\displaystyle \dfrac{46.5+100.5+d}{3}\displaystyle =\displaystyle 78.6Equatate the 3MA to the mean of the data values
\displaystyle \dfrac{147+d}{3}\displaystyle =\displaystyle 78.6Add 46.5 and 100.5
\displaystyle 147+d\displaystyle =\displaystyle 235.8Multiply both sides by 3
\displaystyle d\displaystyle =\displaystyle 88.8Subtract 147 from both sides
e

Determine which moving average best smooths the data?

Worked Solution
Create a strategy

The use of moving average should always match the number of seasons per cycle in the original data and choose the smoothest data.

Apply the idea
1
2
3
4
5
6
7
8
9
10
11
\text{Time period}
55
65
75
85
95
105
y

By graphing the original data, we can see that there are 5 points in one season, by counting from trough to trough.

So the 5-moving average is most appropriate.

Idea summary

To find the 3-moving average for a particular time period, we find the mean of the data values for that time period, the previous time period, and the next time period.

To find the 5-moving average for a particular time period, we find the mean of the data values for that time period, the 2 previous time periods, and the next 2 time periods.

To find the 7-moving average for a particular time period, we find the mean of the data values for that time period, the 3 previous time periods, and the next 3 time periods.

Even moving averages

Even moving averages are used for cycles with an number of data points. We will divide by an even number when calculating the average.

If we calculate an even moving average in the same way we calculated an odd moving average above, we run into a problem. Consider the table below:

\text{Time period}\text{Raw data}4\text{MA}
1119
297
?108.25
386
4131

If we find the average of the first four response variables in the table we get \dfrac{119+97+86+131}{4}=108.25

But where do we put it? The centre of t=1,\,t=2,\,t=3, and t=4 is t=2.5, but we don't have a t=2.5.

We solve the problem using a centred moving average. To calculate the 4 point centred moving average at t=3 we use the first 5 scores. We add half of the first (t=1) value and half of the fifth (t=5) value with the middle three values and divide by four.

\text{Time period}\text{Raw data}4\text{CMA}
1119
297
386106.25
4131
5105

Thus, the 4 point centred moving average (4CMA) is calculated as follows: \begin{aligned} \text{4CMA} =& \dfrac{119\times 0.5+97+86+131+105\times 0.5}{4} \\ =& 106.25\end{aligned}

Even centred moving averages are named 4CMA (4 point centred moving average), 6CMA (6 point centred moving average), 8CMA etc. In order to calculate the average for a particular time period (n) we must use (n+1) data points.

If we have five data points a,\,b,\,c,\,d and e: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}. By taking a half of the first and the last data point, this counts as only one data point and effectively we have only used 4.

If we have five data points a,\,b,\,c,\,d,\,e,\,f and g: \text{6CMA}=\dfrac{0.5a+b+c+d+e+f+0.5g}{6}.

Examples

Example 3

Consider the time series data presented in the table.

\text{Time period}\text{Raw data}\text{4 point centred}\\\text{moving average}\text{6 point centred}\\\text{moving average}
189
2102.5
393.598.09
4111ab
581.79494.77
692.692.0993.08
787.989.9688.96
8c87.5685.93
974.484.5485.79
1080.782.48
1175.6
12d
a

Calculate the value of a in the table. Round your answer to two decimal places.

Worked Solution
Create a strategy

Use formula: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}.

Apply the idea

a is the 4CMA for time period 4, so our data values should come from time periods 2 to 6.

\displaystyle a\displaystyle =\displaystyle \dfrac{0.5\times102.5+93.5+111+81.7+0.5\times 92.6}{4}Substitute the 5 data values
\displaystyle =\displaystyle 95.94Evaluate
b

Calculate the value of b in the table. Round your answer to two decimal places.

Worked Solution
Create a strategy

Use formula: \text{6CMA}=\dfrac{0.5a+b+c+d+e+f+0.5g}{6}.

Apply the idea

b is the 6CMA for time period 4, so our data values should come from time periods 1 to 7.

\displaystyle b\displaystyle =\displaystyle \dfrac{0.5\times89+102.5+93.5+111+81.7+92.6+0.5\times 87.9}{6}Substitute the 7 data values
\displaystyle =\displaystyle 94.96Evaluate
c

Calculate the value of c in the table. Round your answer to one decimal place.

Worked Solution
Create a strategy

Use formula: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}.

Apply the idea

The 4CMA for time period 8 is 87.56. We can substitute this into the above formula along with the data values for time periods 6 to 10.

\displaystyle \dfrac{0.5\times 92.6+87.9+c+74.4+0.5\times 80.7}{4}\displaystyle =\displaystyle 87.56Substitute the values and 4CMA
\displaystyle \dfrac{c+248.95}{4}\displaystyle =\displaystyle 87.56Simplify the numerator
\displaystyle c+248.95\displaystyle =\displaystyle 350.24Multiply both sides by 4
\displaystyle c\displaystyle =\displaystyle 101.3Subtract 248.95 from both sides
d

Calculate the value of d in the table. Round your answer to one decimal place.

Worked Solution
Create a strategy

Use formula: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}.

Apply the idea

The 4CMA for time period 10 is 82.48. We can use this in the formula along with the data values for time periods 8 to 12.

\displaystyle \dfrac{0.5\times 101.3+74.4+80.7+75.6+0.5\times d}{4}\displaystyle =\displaystyle 82.48Substitute the values and 4CMA
\displaystyle \dfrac{281.35+0.5d}{4}\displaystyle =\displaystyle 82.48Simplify the numerator
\displaystyle 281.35+0.5d\displaystyle =\displaystyle 329.92Multiply both sides by 4
\displaystyle 0.5d\displaystyle =\displaystyle 48.57Subtract 281.35 from both sides
\displaystyle d\displaystyle =\displaystyle 97.1Divide both sides by 0.5
Idea summary

Formula for even centred moving averages:

If we have five data points a,\,b,\,c,\,d and e: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}

If we have five data points a,\,b,\,c,\,d,\,e,\,f and g: \text{6CMA}=\dfrac{0.5a+b+c+d+e+f+0.5g}{6}

Spreadsheets with time series data

Spreadsheets are a powerful tool for calculating moving averages and graphing both the original and smoothed data sets.

Follow the instructions to calculate a 3MA for the data below using a spreadsheet and then use your spreadsheet to graph both the original data and the smoothed data.

Note: The instructions shown use a particular spreadsheet software, the notation and behaviour of your spreadsheet may differ slightly from what is presented if you are using a different program. If in doubt, ask your teacher.

\text{Time}\text{Cost}
110
220
315
425
529
624
731
835
930

Step 1: Use the first row to enter titles for the columns - Time for column A, Cost for column B and 3MA for column C. Then enter the time and cost data in their appropriate columns.

The image shows a spreadsheet with 3 columns: A for Time, B for Cost, and C for 3MA. Ask your teacher for more information.

Step 2: Enter the formula for the moving average into Column C.

  • In cell C3 type the formula for the 3 point moving average using the cell reference of the first three cost values as follows: =(B2+B3+B4)/3

  • Note that formulas in spreadsheets always start with an equals sign.

    The image shows a spreadsheet with 3 columns: A for Time, B for Cost, and C for 3MA. Ask your teacher for more information.
  • Press enter and the value of the first 3 point average, 15, will appear.

Step 3: Fill down the formula the formula from cell C3 to cell C9

  • Click on cell C2 to highlight it.

  • Click and drag from the bottom right hand corner to extent to highlighted region down the column to cell C9.

  • The moving averages will now be displayed.

The image shows a spreadsheet with 3 columns: A for Time, B for Cost, and C for 3MA. Ask your teacher for more information.

Step 4: Graph the data

  • Highlight the entire table of data.

  • Select the Insert toolbar.

  • From Charts select a scatter graph with straight line connectors.

  • Both the original data and the smoothed data will be graphed on the same axes.

  • Adjust format such as labelling.

A graph with legends: Cost and 3MA. Labelled with cost and time. Ask your teacher for more information.

Examples

Example 4

Justin weighs himself every 4 months and plots his weight on the graph below.

A graph shows weights from January 2017 to October 2019. Ask your teacher for more information.
a

Which moving average should Justin use if he wishes to smooth the data in order to make predictions and have the moving average values plotted at each month and not between months?

A
3MA
B
3CMA
C
4MA
D
4CMA
Worked Solution
Create a strategy

Count how many points there are in a row before the graph repeats itself.

Apply the idea

From the peak at Jan '17 to the before the peak at Jan '18 there are 4 data points.

The correct answer is option D: 4CMA.

b

Justin enters the following incorrect formula into cell \text{D3} in his spreadsheet in order to calculate a 4 point centred moving average:

Write the correct formula for \text{D3}.

Worked Solution
Create a strategy

Use formula: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}.

Apply the idea

Justin onlu used 4 data values when he should have used 5. He should have also used the value in cell \text{F2}.

The correct formula for \text{D3} is: \text{4CMA}=(\text{B2/2 + C2 + D2 + E2 + F2}/2)/4

c

Calculate the correct value for the 4CMA for cell \text{D3}. Round your answer to one decimal places.

Worked Solution
Create a strategy

Use the formula from part (b).

Apply the idea
\displaystyle \text{4CMA}\displaystyle =\displaystyle (\text{B2/2 + C2 + D2 + E2 + F2}/2)/4Write the formula from part (b)
\displaystyle =\displaystyle \dfrac{\dfrac{81}{2}+78+75+74+\dfrac{80}{2}}{4}Substitute the values in \text{B2},\,\text{C2},\,\text{D2},\,\text{E2},\,\text{F2}
\displaystyle =\displaystyle 76.9Evaluate
Idea summary

By using a spreadsheet, we can easily calculate the moving average and graph the data sets.

Outcomes

ACMGM089

smooth time series data by using a simple moving average, including the use of spreadsheets to implement this process

What is Mathspace

About Mathspace