topic badge

5.02 Moving averages

Lesson

Time series data is a type of bivariate data, so it would be useful if we could use the data to make predictions using a linear regression model or line of best fit. However the peaks and troughs make it difficult to see the underlying trend in the data. In order to predict using a least-squares regression line we must smooth the data first.

In this course we have two methods of smoothing the data:

  • Moving averages
  • Deseasonalising using seasonal indices calculated with the average percentage method

 

Smoothing with moving averages

When we say we want to smooth the data, we mean that we remove the peaks and troughs so that we are able to see the underlying trend of the data more clearly.

One method we can use to smooth the data is to calculate the moving averages of the response variable ($y$y) and then plot time ($t$t) against the moving average values to determine the linear regression model.

As the name suggests, moving averages are averages of rolling sets of data. The moving average method calculates the average of a set number of points around the given data point and then the moving average is plotted against time. This lowers the peaks and raises the troughs, hence smoothing the original data.

It is important that we select the correct type of moving average based on the number of seasons in the time series cycle. If it is a $5$5 season cycle we use a $5$5 point moving average, if there are $3$3 seasons in a cycle we use a $3$3 point moving average. These are both examples of odd moving averages because we have an odd number of seasons in the cycle and to calculate the average we divide by an odd number.

If the correct moving average is calculated and graphed we will see a continually increasing, decreasing or stable underlying trend. If the incorrect moving average is calculated we will still have some peaks and troughs in the data.

 

Calculating an odd moving average

Odd moving averages are used for cycles with an odd number of data points. They are named 3MA ($3$3 point moving average), 5MA ($5$5 point moving average), 7MA ($7$7 point moving average) etc.

If we are calculating a 3MA we find the average of response variable for three data points (For example: $t=1$t=1, $t=2$t=2 and $t=3$t=3) and the result will be plotted against the middle point ($t=2$t=2). It's not possible to find a 3MA for $t=1$t=1 as we don't have a data point below $t=1$t=1.

If we are calculating a 5MA we find the average of the response variable for five data points (For example: $t=1$t=1, $t=2$t=2, $t=3$t=3, $t=4$t=4 and $t=5$t=5) and the result will be plotted against the middle point ($t=3$t=3). It's not possible to find a 5MA for $t=1$t=1 or $t=2$t=2 as we don't have enough points below them.

 

Worked example

Example 1

Calculate the missing values of $x$x and $y$y in the table.

Think: A 3MA is calculated by taking the mean of three numbers. The 3MA for time period $2$2 is calculated by finding the mean of the raw data values for time periods $1$1, $2$2 and $3$3.

So the first 3MA in the table was calculated as follows: $\frac{119+97+86}{3}=100.\overline{6}$119+97+863=100.6

Similarly, the first 5MA in the table, for time period $3$3 was calculated by finding the mean of the data for time periods $1$1, $2$2, $3$3, $4$4 and $5$5 as follows: $\frac{119+97+86+131+105}{5}=107.6$119+97+86+131+1055=107.6

Do: To find $x$x, we move the average along and calculate the mean of the next three numbers (data for time periods $2$2, $3$3 and $4$4).

$x=\frac{97+86+131}{3}=104.\overline{6}$x=97+86+1313=104.6

To find $y$y, we see that the raw data centre value is $105$105 at $t=5$t=5 and we take the two raw data values above and two below that to get five values.

$y=\frac{86+131+105+93+130}{5}=109$y=86+131+105+93+1305=109

Reflect: The blank cells in the table are due to the fact we cannot calculate a 3MA for time period $1$1 as there is no value above $119$119 in the table. Similarly, it's not possible to calculate a 5MA for time period $1$1 or $2$2 as there is not enough data to do so.

 

Which moving average best smooths the data?

Consider the graph below displaying the data from the table above.

We can see the 3MA is a far more smooth line graph when compared with the 5MA line graph, which still appears to have some degree of seasonality with its peaks and troughs.

If we look at the raw data graph and count the data points from one peak to the next peak, we can see this data has $3$3 points per cycle. If there are $3$3 points per cycle, then a 3MA will be the best moving average to use to smooth or deseasonalise the data–this is because a 3MA will account for each of the three seasons present in the data.

Remember!


The moving average used should always match the number of seasons per cycle in the original data.

 

Calculating an even moving average

Even moving averages are used for cycles with an even number of data points. We will divide by an even number when calculating the average. 

If we calculate an even moving average in the same way we calculated an odd moving average above, we run into a problem. Consider the table below:

Time
Period
Raw
Data
4MA
$1$1 $119$119  
$2$2 $97$97  
$2.5$2.5??   $108.25$108.25
$3$3 $86$86  
$4$4 $131$131  

If we find the average of the first four response variables in the table we get $\frac{119+97+86+131}{4}=108.25$119+97+86+1314=108.25

But where do we put it? The centre of $t=1$t=1$t=2$t=2, $t=3$t=3 and $t=4$t=4 is $t=2.5$t=2.5, but we don't have a $t=2.5$t=2.5!

We solve the problem using a centred moving average. To calculate the $4$4 point centred moving average at $t=3$t=3 we use the first $5$5 scores. We add half of the first ($t=1$t=1) value and half of the fifth ($t=5$t=5) value with the middle three values and divide by four.

$\frac{119\times0.5+97+86+131+105\times0.5}{4}=106.5$119×0.5+97+86+131+105×0.54=106.5

Time
Period
Raw
Data
4CMA
$1$1 $119$119  
$2$2 $97$97  
$3$3 $86$86 $106.5$106.5
$4$4 $131$131  
$5$5 $105$105  

 

Calculating an even centred moving average

Even centred moving averages are named 4CMA ($4$4 point centred moving average), 6CMA ($6$6 point centred moving average), 8CMA etc. In order to calculate the average for a particular time period ($n$n) we must use ($n+1$n+1) data points.

Formula for centred moving average

If we have five data points $a$a, $b$b, $c$c, $d$d and $e$e:

$4CMA=\frac{\frac{1}{2}a+b+c+d+\frac{1}{2}e}{4}$4CMA=12a+b+c+d+12e4

By taking a half of the first and the last data point, this counts as only one data point and effectively we have only used $4$4.

If we have seven data points: $a$a, $b$b, $c$c, $d$d, $e$e, $f$f and $g$g:

$6CMA=\frac{\frac{1}{2}a+b+c+d+e+f+\frac{1}{2}g}{6}$6CMA=12a+b+c+d+e+f+12g6

 

Worked example

Example 2

Some raw data is observed over four quarters for three years. Calculate the missing values of $x$x and $y$y.

Think: Notice that because the data has been collected each quarter we have $4$4 data points per cycle, so it makes sense that best moving average to use to smooth the data will be a $4$4 point moving average.

What you can see from the markings in the table is that we actually need to take $5$5 data points, rather than $4$4. We use the full value of the middle $3$3 points and half the value of the $1$1st and $5$5th point.

Do: So to calculate $x$x in the table:

$x$x $=$= $\frac{\frac{1}{2}\times492+520+427+463+\frac{1}{2}\times484}{4}$12×492+520+427+463+12×4844
  $=$= $474.5$474.5

 

To calculate $y$y, a raw data value, we'll have to work backwards from our last 4CMA to calculate it.

$461.625$461.625 $=$= $\frac{\frac{1}{2}\times494+425+462+463+\frac{1}{2}y}{4}$12×494+425+462+463+12y4
$1846.5$1846.5 $=$= $\frac{1}{2}y+1597$12y+1597
$249.5$249.5 $=$= $\frac{1}{2}y$12y
$y$y $=$= $499$499

 

Take a look at the graph of the raw data and the moving average to see how smooth the data has become. It almost looks like a perfect straight line.

 

Practice questions

Question 1

Consider the Time Series graph drawn below, along with two sets of moving averages.

  1. Which moving average is most appropriate for this data?

    $5$5 point moving average

    A

    $7$7 point moving average

    B
  2. Why is the $7$7 point moving average the most appropriate? Select all reasons that apply.

    It removes the most seasonality.

    A

    It best smooths the data.

    B

    It has the same number of points as the original graph.

    C

Question 2

The table contains Time Series data.

Time Period Raw Data $3$3-Moving Average $5$5-Moving Average
$1$1 $113.8$113.8    
$2$2 $109$109 $112.1$112.1  
$3$3 $113.4$113.4 $107.4$107.4 $97.7$97.7
$4$4 $99.7$99.7 $a$a $b$b
$5$5 $52.6$52.6 $86.4$86.4 $93.8$93.8
$6$6 $107$107 $85.3$85.3 $91.6$91.6
$7$7 $96.3$96.3 $101.8$101.8 $90.7$90.7
$8$8 $c$c $98$98 $89.5$89.5
$9$9 $95.6$95.6 $81.4$81.4 $88.2$88.2
$10$10 $46.5$46.5 $80.9$80.9 $86.7$86.7
$11$11 $100.5$100.5 $78.6$78.6  
$12$12 $d$d    
  1. Calculate the value of $a$a correct to one decimal place.

  2. Calculate the value of $b$b correct to one decimal place.

  3. Solve for the value of $c$c in the table.

  4. Calculate the value of d correct to one decimal place.

  5. Which moving average best smooths the data?

    $3$3-moving average

    A

    $5$5-moving average

    B

Question 3

Consider the time series data presented in the table.

Time Period Raw Data $4$4 Point Centred Moving Average $6$6 Point Centred Moving Average
$1$1 $52.9$52.9    
$2$2 $44.2$44.2    
$3$3 $62.6$62.6 $53.11$53.11  
$4$4 $51.3$51.3 $a$a $b$b
$5$5 $55.8$55.8 $54.39$54.39 $55.08$55.08
$6$6 $46.6$46.6 $55.03$55.03 $55.6$55.6
$7$7 $65.1$65.1 $55.78$55.78 $55.03$55.03
$8$8 $c$c $56.35$56.35 $55.69$55.69
$9$9 $59.2$59.2 $56.78$56.78 $57.84$57.84
$10$10 $47.8$47.8 $57.93$57.93  
$11$11 $67.3$67.3    
$12$12 $d$d    
  1. Calculate the value of $a$a in the table. Round your answer to 2 decimal places.

  2. Calculate the value of $b$b in the table, rounding your answer to 2 decimal places.

  3. Solve for the value of $c$c in the table. Round your answer to 1 decimal place.

  4. Solve for the value of $d$d in the table. Round your answer to 1 decimal place.

 

Using a spreadsheet with time series data

Spreadsheets are a powerful tool for calculating moving averages and graphing both the original and smoothed data sets.

 

Worked example

Example 3

Follow the instructions to calculate a 3MA for the data below using a spreadsheet and then use your spreadsheet to graph both the original data and the smoothed data.

Note: The instructions shown use a particular spreadsheet software, the notation and behaviour of your spreadsheet may differ slightly from what is presented if you are using a different program. If in doubt, ask your teacher!

Time Cost
1 10
2 20
3 15
4 25
5 29
6 24
7 31
8 35
9 30

Step 1: Use the first row to enter titles for the columns - Time for column A,  Cost for column B and 3MA for column C. Then enter the time and cost data in their appropriate columns. 

Step 2: Enter the formula for the moving average into Column C.

  • In cell C3 type the formula for the 3 point moving average using the cell reference of the first three cost values as follows:  

=(B2+B3+B4)/3

  • Note that formulas in spreadsheets always start with an equals sign.

  • Press enter and the value of the first 3 point average, $15$15, will appear.

Step 3: Fill down the formula the formula from cell C3 to cell C9

  • Click on cell C2 to highlight it.
  • Click and drag from the bottom right hand corner to extent to highlighted region down the column to cell C9.
  • The moving averages will now be displayed. 

Step 4: Graph the data

  • Highlight the entire table of data.
  • Select the Insert toolbar.
  • From Charts select a scatter graph with straight line connectors.
  • Both the original data and the smoothed data will be graphed on the same axes.
  • Adjust format such as labelling.

  

These calculations can also be performed in the spreadsheet application of a CAS. Select your brand below to see instructions for calculating and graphing moving averages.

Casio Classpad

Coming soon

TI Nspire

Coming soon

 

Practice question

Question 4

Justin weighs himself every $4$4 months and plots his weight on the graph below.

  1. Which moving average should Justin use if he wishes to smooth the data in order to make predictions and have the moving average values plotted at each month and not between months?

    3MA

    A

    3CMA

    B

    4MA

    C

    4CMA

    D
  2. Justin enters the following formula into cell $D3$D3 in his spreadsheet in order to calculate a $4$4 point centred moving average. Is this correct?

    Yes

    A

    No, it should be $=(B2+C2+D2+E2)/4$=(B2+C2+D2+E2)/4

    B

    No, it should be entered into cell $C3$C3

    C

    No, it must use $5$5 data points so it should be $=(B2/2+C2+D2+E2+F2/2)/4$=(B2/2+C2+D2+E2+F2/2)/4

    D
  3. Calculate the correct value for the 4CMA for cell $D3$D3.

    Round your answer to one decimal place.

Outcomes

4.1.3

smooth time series data by using a simple moving average, including the use of spreadsheets to implement this process

What is Mathspace

About Mathspace