Time series data is a type of bivariate data, so it would be useful if we could use the data to make predictions using a linear regression model or line of best fit. However the peaks and troughs make it difficult to see the underlying trend in the data. In order to predict using a least-squares regression line we must smooth the data first.
In this course we have two methods of smoothing the data:
Moving averages
Deseasonalising using seasonal indices calculated with the average percentage method
When we say we want to smooth the data, we mean that we remove the peaks and troughs so that we are able to see the underlying trend of the data more clearly.
One method we can use to smooth the data is to calculate the moving averages of the response variable (y) and then plot this against time (t). A linear regression model can then be applied to the soothed set of data and used to analyse underlying trends.
As the name suggests, moving averages are averages of rolling sets of data. The moving average method calculates the average of a set number of points around the given data point and then the moving average is plotted against time. This lowers the peaks and raises the troughs, hence smoothing the original data.
It is important that we select the correct type of moving average based on the number of seasons in the time series cycle. If it is a 5 season cycle we use a 5 point moving average, if there are 3 seasons in a cycle we use a 3 point moving average. These are both examples of odd moving averages because we have an odd number of seasons in the cycle and to calculate the average we divide by an odd number.
If the correct moving average is calculated and graphed we will see a continually increasing, decreasing or stable underlying trend. If the incorrect moving average is calculated we will still have some peaks and troughs in the data.
Consider the Time Series graph drawn below, along with two sets of moving averages.
Which moving average is most appropriate for this data?
Which two is the reason why is 7 point moving average is the most appropriate?
The moving average used should always match the number of seasons per cycle in the original data.
Odd moving averages are used for cycles with an odd number of data points. They are named 3MA (3 point moving average), 5MA (5 point moving average), 7MA (7 point moving average), etc.
If we are calculating a 3MA we find the average of response variable for three data points (For example: t=1,\,t=2,\, and t=3) and the result will be plotted against the middle point (t=2). It's not possible to find a 3MA for t=1 as we don't have a data point below t=1.
If we are calculating a 5MA we find the average of the response variable for five data points (For example: t=1,\,t=2,\,t=3,\,t=4 and t=5) and the result will be plotted against the middle point (t=3). It's not possible to find a 5MA for t=1 or t=2 as we don't have enough points below them.
Consider the time series data presented in the table:
\text{Time period} | \text{Raw data} | \text{3-moving average} | \text{5-moving average} |
---|---|---|---|
1 | 113.8 | ||
2 | 109 | 112.1 | |
3 | 113.4 | 107.4 | 97.7 |
4 | 99.7 | a | b |
5 | 52.6 | 86.4 | 93.8 |
6 | 107 | 85.3 | 91.6 |
7 | 96.3 | 101.8 | 90.7 |
8 | c | 98 | 89.5 |
9 | 95.6 | 81.4 | 88.2 |
10 | 46.5 | 80.9 | 86.7 |
11 | 100.5 | 78.6 | |
12 | d |
Calculate the value of a correct to one decimal place.
Calculate the value of b correct to one decimal place.
Solve for the value of c in the table.
Calculate the value of d correct to one decimal place.
Determine which moving average best smooths the data?
To find the 3-moving average for a particular time period, we find the mean of the data values for that time period, the previous time period, and the next time period.
To find the 5-moving average for a particular time period, we find the mean of the data values for that time period, the 2 previous time periods, and the next 2 time periods.
To find the 7-moving average for a particular time period, we find the mean of the data values for that time period, the 3 previous time periods, and the next 3 time periods.
Even moving averages are used for cycles with an number of data points. We will divide by an even number when calculating the average.
If we calculate an even moving average in the same way we calculated an odd moving average above, we run into a problem. Consider the table below:
\text{Time period} | \text{Raw data} | 4\text{MA} |
---|---|---|
1 | 119 | |
2 | 97 | |
? | 108.25 | |
3 | 86 | |
4 | 131 |
We solve the problem using a centred moving average. To calculate the 4 point centred moving average at t=3 we use the first 5 scores. We add half of the first (t=1) value and half of the fifth (t=5) value with the middle three values and divide by four.
\text{Time period} | \text{Raw data} | 4\text{CMA} |
---|---|---|
1 | 119 | |
2 | 97 | |
3 | 86 | 106.25 |
4 | 131 | |
5 | 105 |
Even centred moving averages are named 4CMA (4 point centred moving average), 6CMA (6 point centred moving average), 8CMA etc. In order to calculate the average for a particular time period (n) we must use (n+1) data points.
If we have five data points a,\,b,\,c,\,d and e: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}. By taking a half of the first and the last data point, this counts as only one data point and effectively we have only used 4.
If we have five data points a,\,b,\,c,\,d,\,e,\,f and g: \text{6CMA}=\dfrac{0.5a+b+c+d+e+f+0.5g}{6}.
Consider the time series data presented in the table.
\text{Time period} | \text{Raw data} | \text{4 point centred}\\\text{moving average} | \text{6 point centred}\\\text{moving average} |
---|---|---|---|
1 | 89 | ||
2 | 102.5 | ||
3 | 93.5 | 98.09 | |
4 | 111 | a | b |
5 | 81.7 | 94 | 94.77 |
6 | 92.6 | 92.09 | 93.08 |
7 | 87.9 | 89.96 | 88.96 |
8 | c | 87.56 | 85.93 |
9 | 74.4 | 84.54 | 85.79 |
10 | 80.7 | 82.48 | |
11 | 75.6 | ||
12 | d |
Calculate the value of a in the table. Round your answer to two decimal places.
Calculate the value of b in the table. Round your answer to two decimal places.
Calculate the value of c in the table. Round your answer to one decimal place.
Calculate the value of d in the table. Round your answer to one decimal place.
Formula for even centred moving averages:
If we have five data points a,\,b,\,c,\,d and e: \text{4CMA}=\dfrac{0.5a+b+c+d+0.5e}{4}
If we have five data points a,\,b,\,c,\,d,\,e,\,f and g: \text{6CMA}=\dfrac{0.5a+b+c+d+e+f+0.5g}{6}
Spreadsheets are a powerful tool for calculating moving averages and graphing both the original and smoothed data sets.
Follow the instructions to calculate a 3MA for the data below using a spreadsheet and then use your spreadsheet to graph both the original data and the smoothed data.
Note: The instructions shown use a particular spreadsheet software, the notation and behaviour of your spreadsheet may differ slightly from what is presented if you are using a different program. If in doubt, ask your teacher.
\text{Time} | \text{Cost} |
---|---|
1 | 10 |
2 | 20 |
3 | 15 |
4 | 25 |
5 | 29 |
6 | 24 |
7 | 31 |
8 | 35 |
9 | 30 |
Step 1: Use the first row to enter titles for the columns - Time for column A, Cost for column B and 3MA for column C. Then enter the time and cost data in their appropriate columns.
Step 2: Enter the formula for the moving average into Column C.
In cell C3 type the formula for the 3 point moving average using the cell reference of the first three cost values as follows: =(B2+B3+B4)/3
Note that formulas in spreadsheets always start with an equals sign.
Press enter and the value of the first 3 point average, 15, will appear.
Step 3: Fill down the formula the formula from cell C3 to cell C9
Click on cell C2 to highlight it.
Click and drag from the bottom right hand corner to extent to highlighted region down the column to cell C9.
The moving averages will now be displayed.
Step 4: Graph the data
Highlight the entire table of data.
Select the Insert toolbar.
From Charts select a scatter graph with straight line connectors.
Both the original data and the smoothed data will be graphed on the same axes.
Adjust format such as labelling.
Justin weighs himself every 4 months and plots his weight on the graph below.
Which moving average should Justin use if he wishes to smooth the data in order to make predictions and have the moving average values plotted at each month and not between months?
Justin enters the following incorrect formula into cell \text{D3} in his spreadsheet in order to calculate a 4 point centred moving average:
Write the correct formula for \text{D3}.
Calculate the correct value for the 4CMA for cell \text{D3}. Round your answer to one decimal places.
By using a spreadsheet, we can easily calculate the moving average and graph the data sets.