topic badge
AustraliaVIC
VCE 12 General 2023

4.02 Smoothing with moving means

Lesson

Introduction

Time series data is a type of bivariate data and typically it will be used to make some sort of prediction. However one issue with time series data is that its fluctuating nature makes this difficult. If a least squares line was fit to the data, it would not have a strong correlation. In order to help identify trends and make predictions, a process called smoothing can be used. Smoothing removes the peaks and troughs from the data and allows the underlying trend to be more easily seen.

Moving means

As the name suggests, the means of sets of data are calculated and then plotted replacing the original data resulting in a smoothed effect.

The following demonstrates how to calculate what is called a three-point moving mean and a five-point moving mean. This strategy is sometimes referred to as moving average (MA).

Examples

Example 1

Consider the time series data presented in the table:

\text{Time period}\text{Raw data}\text{3-moving average}\text{5-moving average}
1113.8
2109112.1
3113.4107.497.7
499.7ab
552.686.493.8
610785.391.6
796.3101.890.7
8c9889.5
995.681.488.2
1046.580.986.7
11100.578.6
12d
a

Calculate the value of a correct to one decimal place.

Worked Solution
Create a strategy

Find the mean of the data values for the previous, current and next time periods.

Apply the idea

a is a 3MA for time period 4, so we need to find the mean of the data values for time periods 3,\,4, and 5.

\displaystyle a\displaystyle =\displaystyle \dfrac{113.4+99.7+52.6}{3}Find the mean of the data values
\displaystyle =\displaystyle 88.6Evaluate
b

Calculate the value of b correct to one decimal place.

Worked Solution
Create a strategy

Find the mean of the data values for the 2 previous, current and 2 next time periods.

Apply the idea

b is a 5MA for time period 4, so we need to find the mean of the 5 data values for time periods 2,\, 3,\,4,\, 5, and 6.

\displaystyle b\displaystyle =\displaystyle \dfrac{109+113.4+99.7+52.6+107}{5}Find the mean of the data values
\displaystyle =\displaystyle 96.3Evaluate
c

Solve for the value of c in the table.

Worked Solution
Create a strategy

Use the 3MA for the same time period, as well as the data values before and after.

Apply the idea

The 3MA for time period 8 is 98. This will be equal to the mean of the data for time periods 7, \, 8, and 9.

\displaystyle \dfrac{96.3+c+95.6}{3}\displaystyle =\displaystyle 98Equatate the 3MA to the mean of the data values
\displaystyle \dfrac{191.9+c}{3}\displaystyle =\displaystyle 98Add 96.3 and 95.6
\displaystyle 191.9+c\displaystyle =\displaystyle 294Multiply both sides by 3
\displaystyle c\displaystyle =\displaystyle 102.1Subtract 191.9 from both sides
d

Calculate the value of d correct to one decimal place.

Worked Solution
Create a strategy

Use the 3MA for the previous time period, as well as the data values from the previous two time periods.

Apply the idea

The 3MA for time period 11 is 78.6. This will be equal to the mean of the data for time periods 10, \, 11, and 12.

\displaystyle \dfrac{46.5+100.5+d}{3}\displaystyle =\displaystyle 78.6Equatate the 3MA to the mean of the data values
\displaystyle \dfrac{147+d}{3}\displaystyle =\displaystyle 78.6Add 46.5 and 100.5
\displaystyle 147+d\displaystyle =\displaystyle 235.8Multiply both sides by 3
\displaystyle d\displaystyle =\displaystyle 88.8Subtract 147 from both sides
e

Which moving average best smooths the data?

A
3 moving average
B
5 moving average
Worked Solution
Create a strategy

The use of moving average should always match the number of seasons per cycle in the original data and choose the smoothest data.

Apply the idea
1
2
3
4
5
6
7
8
9
10
11
\text{Time period}
55
65
75
85
95
105
y

By graphing the original data, we can see that there are 5 points in one season, by counting from trough to trough.

So the 5-moving average is most appropriate.

The answer is option B.

Idea summary

To find the 3-moving average for a particular time period, we find the mean of the data values for that time period, the previous time period, and the next time period.

To find the 5-moving average for a particular time period, we find the mean of the data values for that time period, the 2 previous time periods, and the next 2 time periods.

Number of points used

The number of points chosen affects how the data will be smoothed. It is not always the case that a greater number of points better smooths the data. Consider the graph below.

A time series graph with legends: Raw Data, 3MA, and 5MA. Ask your teacher for more information.

From the graph, the three-point mean is a far smoother line graph when compared with the five-point mean line graph, which still appears to have some degree of seasonality with its peaks and troughs.

The best way to determine the number of points to use for a moving mean is to count the number of points or seasons per cycle. Counting from one peak to the next peak, it can be seen that there are three points per cycle. Therefore a three-point moving mean will most likely be the best to smooth the data - this is because it will account for each of the three seasons present in the data.

Examples

Example 2

Consider the Time Series graph drawn below, along with two sets of moving averages.

A time series graph with legends: data, MA3, and MA5. Ask your teacher for more information.
a

Which moving average is most appropriate for this data?

A
5 point moving average
B
3 point moving average
Worked Solution
Create a strategy

Count how many points there are in a row before the graph repeats itself.

Apply the idea
1
2
3
4
5
6
7
\text{Time period}
5
10
15
20
25
30
35
40
45
\text{Data}

There are two large peaks at time periods 0 and 5, so we can count the number of points from peak to peak as shown in the graph.

We count 5 points, so we need a 5 point moving average. The correct answer is option A.

b

Why is the 5 point moving average the most appropriate?

A
It best smooths the data.
B
It matches the number of points in the graph.
Worked Solution
Create a strategy

The moving average should always match the number of seasons per cycle in the original data and smooth the data.

Apply the idea

There are more points than 5 in the original graph, so option B is incorrect.

The correct answer is option A.

Idea summary

The moving average used should always match the number of seasons per cycle in the original data.

Centring with even number of data points

The above examples used an odd number of points when calculating the means. Calculating the moving means for an even number of points requires the use of a process called centring.

If there are five data points: a, b, c, d, and e:

then a \text{four-point moving mean}=\dfrac{\dfrac12a+b+c+d+\dfrac12e}{4}

By taking a half of the first and the last data point, this counts as only one data point and effectively we have only used 4.

Likewise, if there are seven data points: a, b, c, d, e, f, and g:

then a \text{six-point moving mean}=\dfrac{\dfrac12a+b+c+d+e+f+\dfrac12g}{4}

Examples

Example 3

Consider the time series data presented in the table.

\text{Time period}\text{Raw data}\text{4 point centred}\\\text{moving average}\text{6 point centred}\\\text{moving average}
189
2102.5
393.598.09
4111ab
581.79494.77
692.692.0993.08
787.989.9688.96
8c87.5685.93
974.484.5485.79
1080.782.48
1175.6
12d
a

Calculate the value of a in the table. Round your answer to two decimal places.

Worked Solution
Create a strategy

Use formula: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}.

Apply the idea

a is the 4CMA for time period 4, so our data values should come from time periods 2 to 6.

\displaystyle a\displaystyle =\displaystyle \dfrac{0.5\times102.5+93.5+111+81.7+0.5\times 92.6}{4}Substitute the 5 data values
\displaystyle =\displaystyle 95.94Evaluate
b

Calculate the value of b in the table. Round your answer to two decimal places.

Worked Solution
Create a strategy

Use formula: \text{6CMA}=\dfrac{0.5a+b+c+d+e+f+0.5g}{6}.

Apply the idea

b is the 6CMA for time period 4, so our data values should come from time periods 1 to 7.

\displaystyle b\displaystyle =\displaystyle \dfrac{0.5\times89+102.5+93.5+111+81.7+92.6+0.5\times 87.9}{6}Substitute the 7 data values
\displaystyle =\displaystyle 94.96Evaluate
c

Calculate the value of c in the table. Round your answer to one decimal place.

Worked Solution
Create a strategy

Use formula: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}.

Apply the idea

The 4CMA for time period 8 is 87.56. We can substitute this into the above formula along with the data values for time periods 6 to 10.

\displaystyle \dfrac{0.5\times 92.6+87.9+c+74.4+0.5\times 80.7}{4}\displaystyle =\displaystyle 87.56Substitute the values and 4CMA
\displaystyle \dfrac{c+248.95}{4}\displaystyle =\displaystyle 87.56Simplify the numerator
\displaystyle c+248.95\displaystyle =\displaystyle 350.24Multiply both sides by 4
\displaystyle c\displaystyle =\displaystyle 101.3Subtract 248.95 from both sides
d

Calculate the value of d in the table. Round your answer to one decimal place.

Worked Solution
Create a strategy

Use formula: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}.

Apply the idea

The 4CMA for time period 10 is 82.48. We can use this in the formula along with the data values for time periods 8 to 12.

\displaystyle \dfrac{0.5\times 101.3+74.4+80.7+75.6+0.5\times d}{4}\displaystyle =\displaystyle 82.48Substitute the values and 4CMA
\displaystyle \dfrac{281.35+0.5d}{4}\displaystyle =\displaystyle 82.48Simplify the numerator
\displaystyle 281.35+0.5d\displaystyle =\displaystyle 329.92Multiply both sides by 4
\displaystyle 0.5d\displaystyle =\displaystyle 48.57Subtract 281.35 from both sides
\displaystyle d\displaystyle =\displaystyle 97.1Divide both sides by 0.5
e

Which centred moving average best smoothes the data?

A
4-point centred moving average
B
6-point centred moving average
Worked Solution
Create a strategy

The use of moving average should always match the number of seasons per cycle in the original data and choose the smoothest data.

Apply the idea
1
2
3
4
5
6
7
8
9
10
11
\text{Time period}
55
65
75
85
95
105
y

By graphing the original data, we can see that there are 5 points in one season, by counting from trough to trough.

So the 5-moving average is most appropriate.

The answer is option B.

Idea summary

If there are five data points: a, b, c, d, and e:

then a \text{four-point moving mean}=\dfrac{\dfrac12a+b+c+d+\dfrac12e}{4}

By taking a half of the first and the last data point, this counts as only one data point and effectively we have only used 4.

Likewise, if there are seven data points: a, b, c, d, e, f, and g:

then a \text{six-point moving mean}=\dfrac{\dfrac12a+b+c+d+e+f+\dfrac12g}{4}

Outcomes

U3.AoS1.28

identify key qualitative features of a time series plot including trend (using smoothing if necessary), seasonality, irregular fluctuations and outliers, and interpret these in the context of the data

What is Mathspace

About Mathspace