Lesson

Time series data is a type of bivariate data, so it would be useful if we could use the data to make predictions using a linear regression model or line of best fit. However the peaks and troughs make it difficult to see the underlying trend in the data. In order to predict using a least-squares regression line we must smooth the data first.

In this course we have two methods of smoothing the data:

- Moving averages
- Deseasonalising using seasonal indices calculated with the average percentage method

When we say we want to smooth the data, we mean that we remove the peaks and troughs so that we are able to see the underlying trend of the data more clearly.

One method we can use to smooth the data is to calculate the moving averages of the response variable ($y$`y`) and then plot this against time ($t$`t`). A linear regression model can then be applied to the soothed set of data and used to analyse underlying trends.

As the name suggests, moving averages are averages of rolling sets of data. The moving average method calculates the average of a set number of points around the given data point and then the moving average is plotted against time. This lowers the peaks and raises the troughs, hence smoothing the original data.

It is important that we select the correct type of moving average based on the number of seasons in the time series cycle. If it is a $5$5 season cycle we use a $5$5 point moving average, if there are $3$3 seasons in a cycle we use a $3$3 point moving average. These are both examples of odd moving averages because we have an odd number of seasons in the cycle and to calculate the average we divide by an odd number.

If the correct moving average is calculated and graphed we will see a continually increasing, decreasing or stable underlying trend. If the incorrect moving average is calculated we will still have some peaks and troughs in the data.

Odd moving averages are used for cycles with an **odd** number of data points. They are named 3MA ($3$3 point moving average), 5MA ($5$5 point moving average), 7MA ($7$7 point moving average), etc.

If we are calculating a 3MA we find the average of response variable for three data points (For example: $t=1$`t`=1, $t=2$`t`=2 and $t=3$`t`=3) and the result will be plotted against the middle point ($t=2$`t`=2). It's not possible to find a 3MA for $t=1$`t`=1 as we don't have a data point below $t=1$`t`=1.

If we are calculating a 5MA we find the average of the response variable for five data points (For example: $t=1$`t`=1, $t=2$`t`=2, $t=3$`t`=3, $t=4$`t`=4 and $t=5$`t`=5) and the result will be plotted against the middle point ($t=3$`t`=3). It's not possible to find a 5MA for $t=1$`t`=1 or $t=2$`t`=2 as we don't have enough points below them.

Calculate the missing values of $x$`x` and $y$`y` in the table.

**Think:** A 3MA is calculated by taking the mean of **three** numbers. The 3MA for time period $2$2 is calculated by finding the mean of the raw data values for time periods $1$1, $2$2 and $3$3.

So the first 3MA in the table was calculated as follows: $\frac{119+97+86}{3}=100.\overline{6}$119+97+863=100.6

Similarly, the first 5MA in the table, for time period $3$3 was calculated by finding the mean of the data for time periods $1$1, $2$2, $3$3, $4$4 and $5$5 as follows: $\frac{119+97+86+131+105}{5}=107.6$119+97+86+131+1055=107.6

**Do:** To find $x$`x`, we move the average along and calculate the mean of the next three numbers (data for time periods $2$2, $3$3 and $4$4).

$x=\frac{97+86+131}{3}=104.\overline{6}$`x`=97+86+1313=104.6

To find $y$`y`, we see that the raw data centre value is $105$105 at $t=5$`t`=5 and we take the two raw data values above and two below that to get five values.

$y=\frac{86+131+105+93+130}{5}=109$`y`=86+131+105+93+1305=109

**Reflect:** We can see blank cells in the top and bottom of the table. These are due to the fact we require a data value either side of the time period we are calculating an average for to find the $3$3 point moving average and we require two data points either side to calculate the $5$5 point moving average.

Consider the graph below displaying the data from the table above.

We can see the 3MA is a far more smooth line graph when compared with the 5MA line graph, which still appears to have some degree of seasonality with its peaks and troughs.

If we look at the raw data graph and count the data points from one peak to the next peak, we can see this data has $3$3 points per cycle. If there are $3$3 points per cycle, then a 3MA will be the best moving average to use to smooth or deseasonalise the data–this is because a 3MA will account for each of the three seasons present in the data.

Remember!

The moving average used should always match the number of seasons per cycle in the original data.

Even moving averages are used for cycles with an **even** number of data points. We will divide by an even number when calculating the average.

If we calculate an even moving average in the same way we calculated an odd moving average above, we run into a problem. Consider the table below:

Time Period |
Raw Data |
4MA |
---|---|---|

$1$1 | $119$119 | |

$2$2 | $97$97 | |

$2.5$2.5?? | $108.25$108.25 | |

$3$3 | $86$86 | |

$4$4 | $131$131 |

If we find the average of the first four response variables in the table we get $\frac{119+97+86+131}{4}=108.25$119+97+86+1314=108.25

But where do we put it? The centre of $t=1$`t`=1, $t=2$`t`=2, $t=3$`t`=3 and $t=4$`t`=4 is $t=2.5$`t`=2.5, but we don't have a $t=2.5$`t`=2.5!

We solve the problem using a centred moving average. To calculate the $4$4 point centred moving average at $t=3$`t`=3 we use the first $5$5 scores. We add half of the first ($t=1$`t`=1) value and half of the fifth ($t=5$`t`=5) value with the middle three values and divide by four. Thus, the $4$4 point centred moving average(4CMA) is calculated as follows:

4CMA | $=$= | $\frac{119\times0.5+97+86+131+105\times0.5}{4}$119×0.5+97+86+131+105×0.54 |

$=$= | $106.5$106.5 |

Time Period |
Raw Data |
4CMA |
---|---|---|

$1$1 | $119$119 | |

$2$2 | $97$97 | |

$3$3 | $86$86 | $106.5$106.5 |

$4$4 | $131$131 | |

$5$5 | $105$105 |

Even centred moving averages are named 4CMA ($4$4 point centred moving average), 6CMA ($6$6 point centred moving average), 8CMA etc. In order to calculate the average for a particular time period ($n$`n`) we must use ($n+1$`n`+1) data points.

Formula for centred moving average

If we have five data points $a$`a`, $b$`b`, $c$`c`, $d$`d` and $e$`e`:

$4CMA=\frac{\frac{1}{2}a+b+c+d+\frac{1}{2}e}{4}$4`C``M``A`=12`a`+`b`+`c`+`d`+12`e`4

By taking a half of the first and the last data point, this counts as only one data point and effectively we have only used $4$4.

If we have seven data points: $a$`a`, $b$`b`, $c$`c`, $d$`d`, $e$`e`, $f$`f` and $g$`g`:

$6CMA=\frac{\frac{1}{2}a+b+c+d+e+f+\frac{1}{2}g}{6}$6`C``M``A`=12`a`+`b`+`c`+`d`+`e`+`f`+12`g`6

Some raw data is observed over four quarters for three years. Calculate the missing values of $x$`x` and $y$`y`.

**Think:** Notice that because the data has been collected each quarter we have $4$4 data points per cycle, so it makes sense that best moving average to use to smooth the data will be a $4$4 point moving average.

What you can see from the markings in the table is that we actually need to take $5$5 data points, rather than $4$4. We use the full value of the middle $3$3 points and half the value of the $1$1st and $5$5th point.

**Do:** So to calculate $x$`x` in the table:

$x$x |
$=$= | $\frac{\frac{1}{2}\times492+520+427+463+\frac{1}{2}\times484}{4}$12×492+520+427+463+12×4844 |

$=$= | $474.5$474.5 |

To calculate $y$`y`, a raw data value, we'll have to work backwards from the 4CMA for June 2014 to calculate it.

$\frac{\frac{1}{2}\times494+425+462+463+\frac{1}{2}y}{4}$12×494+425+462+463+12y4 |
$=$= | $461.625$461.625 |
Using the formula for the 4CMA with know values substituted in. |

$\frac{1}{2}y+1597$12y+1597 |
$=$= | $1846.5$1846.5 |
Multiply both sides by $4$4. |

$\frac{1}{2}y$12y |
$=$= | $249.5$249.5 |
Take $1597$1597 from both sides. |

$y$y |
$=$= | $499$499 |
Multiply both sides by $2$2. |

Take a look at the graph of the raw data and the moving average to see how smooth the data has become. It almost looks like a perfect straight line.

Consider the Time Series graph drawn below, along with two sets of moving averages.

Which moving average is most appropriate for this data?

$5$5 point moving average

A$7$7 point moving average

B$5$5 point moving average

A$7$7 point moving average

BWhy is the $7$7 point moving average the most appropriate? Select all reasons that apply.

It removes the most seasonality.

AIt best smooths the data.

BIt has the same number of points as the original graph.

CIt removes the most seasonality.

AIt best smooths the data.

BIt has the same number of points as the original graph.

C

The table contains Time Series data.

Time Period | Raw Data | $3$3-Moving Average | $5$5-Moving Average |
---|---|---|---|

$1$1 | $113.8$113.8 | ||

$2$2 | $109$109 | $112.1$112.1 | |

$3$3 | $113.4$113.4 | $107.4$107.4 | $97.7$97.7 |

$4$4 | $99.7$99.7 | $a$a |
$b$b |

$5$5 | $52.6$52.6 | $86.4$86.4 | $93.8$93.8 |

$6$6 | $107$107 | $85.3$85.3 | $91.6$91.6 |

$7$7 | $96.3$96.3 | $101.8$101.8 | $90.7$90.7 |

$8$8 | $c$c |
$98$98 | $89.5$89.5 |

$9$9 | $95.6$95.6 | $81.4$81.4 | $88.2$88.2 |

$10$10 | $46.5$46.5 | $80.9$80.9 | $86.7$86.7 |

$11$11 | $100.5$100.5 | $78.6$78.6 | |

$12$12 | $d$d |

Calculate the value of $a$

`a`correct to one decimal place.Calculate the value of $b$

`b`correct to one decimal place.Solve for the value of $c$

`c`in the table.Calculate the value of

*d*correct to one decimal place.Which moving average best smooths the data?

$3$3-moving average

A$5$5-moving average

B$3$3-moving average

A$5$5-moving average

B

Consider the time series data presented in the table.

Time Period | Raw Data | $4$4 Point Centred Moving Average | $6$6 Point Centred Moving Average |
---|---|---|---|

$1$1 | $52.9$52.9 | ||

$2$2 | $44.2$44.2 | ||

$3$3 | $62.6$62.6 | $53.11$53.11 | |

$4$4 | $51.3$51.3 | $a$a |
$b$b |

$5$5 | $55.8$55.8 | $54.39$54.39 | $55.08$55.08 |

$6$6 | $46.6$46.6 | $55.03$55.03 | $55.6$55.6 |

$7$7 | $65.1$65.1 | $55.78$55.78 | $55.03$55.03 |

$8$8 | $c$c |
$56.35$56.35 | $55.69$55.69 |

$9$9 | $59.2$59.2 | $56.78$56.78 | $57.84$57.84 |

$10$10 | $47.8$47.8 | $57.93$57.93 | |

$11$11 | $67.3$67.3 | ||

$12$12 | $d$d |

Calculate the value of $a$

`a`in the table. Round your answer to 2 decimal places.Calculate the value of $b$

`b`in the table, rounding your answer to 2 decimal places.Solve for the value of $c$

`c`in the table. Round your answer to 1 decimal place.Solve for the value of $d$

`d`in the table. Round your answer to 1 decimal place.

Spreadsheets are a powerful tool for calculating moving averages and graphing both the original and smoothed data sets.

Follow the instructions to calculate a 3MA for the data below using a spreadsheet and then use your spreadsheet to graph both the original data and the smoothed data.

**Note:** The instructions shown use a particular spreadsheet software, the notation and behaviour of your spreadsheet may differ slightly from what is presented if you are using a different program. If in doubt, ask your teacher!

Time | Cost |
---|---|

1 | 10 |

2 | 20 |

3 | 15 |

4 | 25 |

5 | 29 |

6 | 24 |

7 | 31 |

8 | 35 |

9 | 30 |

**Step 1:** Use the first row to enter titles for the columns - Time for column A, Cost for column B and 3MA for column C. Then enter the time and cost data in their appropriate columns.

**Step 2:** Enter the formula for the moving average into Column C.

- In cell C3 type the formula for the 3 point moving average using the cell reference of the first three cost values as follows:

=(B2+B3+B4)/3

- Note that formulas in spreadsheets always start with an equals sign.

- Press enter and the value of the first 3 point average, $15$15, will appear.

**Step 3: **Fill down the formula the formula from cell C3 to cell C9

- Click on cell C2 to highlight it.
- Click and drag from the bottom right hand corner to extent to highlighted region down the column to cell C9.
- The moving averages will now be displayed.

**Step 4:** Graph the data

- Highlight the entire table of data.
- Select the
**Insert**toolbar. - From
**Charts**select a scatter graph with straight line connectors. - Both the original data and the smoothed data will be graphed on the same axes.
- Adjust format such as labelling.

These calculations can also be performed in the spreadsheet application of a CAS. Select your brand below to see instructions for calculating and graphing moving averages.

Casio Classpad

Coming soon

TI Nspire

Coming soon

Justin weighs himself every $4$4 months and plots his weight on the graph below.

Which moving average should Justin use if he wishes to smooth the data in order to make predictions and have the moving average values plotted at each month and not between months?

3MA

A3CMA

B4MA

C4CMA

D3MA

A3CMA

B4MA

C4CMA

DJustin enters the following formula into cell $D3$

`D`3 in his spreadsheet in order to calculate a $4$4 point centred moving average. Is this correct?Yes

ANo, it should be $=(B2+C2+D2+E2)/4$=(

`B`2+`C`2+`D`2+`E`2)/4BNo, it should be entered into cell $C3$

`C`3CNo, it must use $5$5 data points so it should be $=(B2/2+C2+D2+E2+F2/2)/4$=(

`B`2/2+`C`2+`D`2+`E`2+`F`2/2)/4DYes

ANo, it should be $=(B2+C2+D2+E2)/4$=(

`B`2+`C`2+`D`2+`E`2)/4BNo, it should be entered into cell $C3$

`C`3CNo, it must use $5$5 data points so it should be $=(B2/2+C2+D2+E2+F2/2)/4$=(

`B`2/2+`C`2+`D`2+`E`2+`F`2/2)/4DCalculate the correct value for the 4CMA for cell $D3$

`D`3.Round your answer to one decimal place.

smooth time series data by using a simple moving average, including the use of spreadsheets to implement this process