Time series data is a type of bivariate data, so it would be useful if we could use the data to make predictions using a linear regression model or line of best fit. However the peaks and troughs make it difficult to see the underlying trend in the data. In order to predict using a least-squares regression line we must smooth the data first.
In this course we have two methods of smoothing the data:
When we say we want to smooth the data, we mean that we remove the peaks and troughs so that we are able to see the underlying trend of the data more clearly.
One method we can use to smooth the data is to calculate the moving averages of the response variable ($y$y) and then plot time ($t$t) against the moving average values to determine the linear regression model.
As the name suggests, moving averages are averages of rolling sets of data. The moving average method calculates the average of a set number of points around the given data point and then the moving average is plotted against time. This lowers the peaks and raises the troughs, hence smoothing the original data.
It is important that we select the correct type of moving average based on the number of seasons in the time series cycle. If it is a $5$5 season cycle we use a $5$5 point moving average, if there are $3$3 seasons in a cycle we use a $3$3 point moving average. These are both examples of odd moving averages because we have an odd number of seasons in the cycle and to calculate the average we divide by an odd number.
If the correct moving average is calculated and graphed we will see a continually increasing, decreasing or stable underlying trend. If the incorrect moving average is calculated we will still have some peaks and troughs in the data.
Odd moving averages are used for cycles with an odd number of data points. They are named 3MA ($3$3 point moving average), 5MA ($5$5 point moving average), 7MA ($7$7 point moving average) etc.
If we are calculating a 3MA we find the average of response variable for three data points (For example: $t=1$t=1, $t=2$t=2 and $t=3$t=3) and the result will be plotted against the middle point ($t=2$t=2). It's not possible to find a 3MA for $t=1$t=1 as we don't have a data point below $t=1$t=1.
If we are calculating a 5MA we find the average of the response variable for five data points (For example: $t=1$t=1, $t=2$t=2, $t=3$t=3, $t=4$t=4 and $t=5$t=5) and the result will be plotted against the middle point ($t=3$t=3). It's not possible to find a 5MA for $t=1$t=1 or $t=2$t=2 as we don't have enough points below them.
Calculate the missing values of $x$x and $y$y in the table.
Think: A 3MA is calculated by taking the mean of three numbers. The 3MA for time period $2$2 is calculated by finding the mean of the raw data values for time periods $1$1, $2$2 and $3$3.
So the first 3MA in the table was calculated as follows: $\frac{119+97+86}{3}=100.\overline{6}$119+97+863=100.6
Similarly, the first 5MA in the table, for time period $3$3 was calculated by finding the mean of the data for time periods $1$1, $2$2, $3$3, $4$4 and $5$5 as follows: $\frac{119+97+86+131+105}{5}=107.6$119+97+86+131+1055=107.6
Do: To find $x$x, we move the average along and calculate the mean of the next three numbers (data for time periods $2$2, $3$3 and $4$4).
$x=\frac{97+86+131}{3}=104.\overline{6}$x=97+86+1313=104.6
To find $y$y, we see that the raw data centre value is $105$105 at $t=5$t=5 and we take the two raw data values above and two below that to get five values.
$y=\frac{86+131+105+93+130}{5}=109$y=86+131+105+93+1305=109
Reflect: The blank cells in the table are due to the fact we cannot calculate a 3MA for time period $1$1 as there is no value above $119$119 in the table. Similarly, it's not possible to calculate a 5MA for time period $1$1 or $2$2 as there is not enough data to do so.
Consider the graph below displaying the data from the table above.
We can see the 3MA is a far more smooth line graph when compared with the 5MA line graph, which still appears to have some degree of seasonality with its peaks and troughs.
If we look at the raw data graph and count the data points from one peak to the next peak, we can see this data has $3$3 points per cycle. If there are $3$3 points per cycle, then a 3MA will be the best moving average to use to smooth or deseasonalise the data–this is because a 3MA will account for each of the three seasons present in the data.
The moving average used should always match the number of seasons per cycle in the original data.
Even moving averages are used for cycles with an even number of data points. We will divide by an even number when calculating the average.
If we calculate an even moving average in the same way we calculated an odd moving average above, we run into a problem. Consider the table below:
Time Period |
Raw Data |
4MA |
---|---|---|
$1$1 | $119$119 | |
$2$2 | $97$97 | |
$2.5$2.5?? | $108.25$108.25 | |
$3$3 | $86$86 | |
$4$4 | $131$131 |
If we find the average of the first four response variables in the table we get $\frac{119+97+86+131}{4}=108.25$119+97+86+1314=108.25
But where do we put it? The centre of $t=1$t=1, $t=2$t=2, $t=3$t=3 and $t=4$t=4 is $t=2.5$t=2.5, but we don't have a $t=2.5$t=2.5!
We solve the problem using a centred moving average. To calculate the $4$4 point centred moving average at $t=3$t=3 we use the first $5$5 scores. We add half of the first ($t=1$t=1) value and half of the fifth ($t=5$t=5) value with the middle three values and divide by four.
$\frac{119\times0.5+97+86+131+105\times0.5}{4}=106.5$119×0.5+97+86+131+105×0.54=106.5
Time Period |
Raw Data |
4CMA |
---|---|---|
$1$1 | $119$119 | |
$2$2 | $97$97 | |
$3$3 | $86$86 | $106.5$106.5 |
$4$4 | $131$131 | |
$5$5 | $105$105 |
Even centred moving averages are named 4CMA ($4$4 point centred moving average), 6CMA ($6$6 point centred moving average), 8CMA etc. In order to calculate the average for a particular time period ($n$n) we must use ($n+1$n+1) data points.
If we have five data points $a$a, $b$b, $c$c, $d$d and $e$e:
$4CMA=\frac{\frac{1}{2}a+b+c+d+\frac{1}{2}e}{4}$4CMA=12a+b+c+d+12e4
By taking a half of the first and the last data point, this counts as only one data point and effectively we have only used $4$4.
If we have seven data points: $a$a, $b$b, $c$c, $d$d, $e$e, $f$f and $g$g:
$6CMA=\frac{\frac{1}{2}a+b+c+d+e+f+\frac{1}{2}g}{6}$6CMA=12a+b+c+d+e+f+12g6
Some raw data is observed over four quarters for three years. Calculate the missing values of $x$x and $y$y.
Think: Notice that because the data has been collected each quarter we have $4$4 data points per cycle, so it makes sense that best moving average to use to smooth the data will be a $4$4 point moving average.
What you can see from the markings in the table is that we actually need to take $5$5 data points, rather than $4$4. We use the full value of the middle $3$3 points and half the value of the $1$1st and $5$5th point.
Do: So to calculate $x$x in the table:
$x$x | $=$= | $\frac{\frac{1}{2}\times492+520+427+463+\frac{1}{2}\times484}{4}$12×492+520+427+463+12×4844 |
$=$= | $474.5$474.5 |
To calculate $y$y, a raw data value, we'll have to work backwards from our last 4CMA to calculate it.
$461.625$461.625 | $=$= | $\frac{\frac{1}{2}\times494+425+462+463+\frac{1}{2}y}{4}$12×494+425+462+463+12y4 |
$1846.5$1846.5 | $=$= | $\frac{1}{2}y+1597$12y+1597 |
$249.5$249.5 | $=$= | $\frac{1}{2}y$12y |
$y$y | $=$= | $499$499 |
Take a look at the graph of the raw data and the moving average to see how smooth the data has become. It almost looks like a perfect straight line.
Consider the Time Series graph drawn below, along with two sets of moving averages.
Which moving average is most appropriate for this data?
$5$5 point moving average
$7$7 point moving average
Why is the $7$7 point moving average the most appropriate? Select all reasons that apply.
It removes the most seasonality.
It best smooths the data.
It has the same number of points as the original graph.
The table contains Time Series data.
Time Period | Raw Data | $3$3-Moving Average | $5$5-Moving Average |
---|---|---|---|
$1$1 | $113.8$113.8 | ||
$2$2 | $109$109 | $112.1$112.1 | |
$3$3 | $113.4$113.4 | $107.4$107.4 | $97.7$97.7 |
$4$4 | $99.7$99.7 | $a$a | $b$b |
$5$5 | $52.6$52.6 | $86.4$86.4 | $93.8$93.8 |
$6$6 | $107$107 | $85.3$85.3 | $91.6$91.6 |
$7$7 | $96.3$96.3 | $101.8$101.8 | $90.7$90.7 |
$8$8 | $c$c | $98$98 | $89.5$89.5 |
$9$9 | $95.6$95.6 | $81.4$81.4 | $88.2$88.2 |
$10$10 | $46.5$46.5 | $80.9$80.9 | $86.7$86.7 |
$11$11 | $100.5$100.5 | $78.6$78.6 | |
$12$12 | $d$d |
Calculate the value of $a$a correct to one decimal place.
Calculate the value of $b$b correct to one decimal place.
Solve for the value of $c$c in the table.
Calculate the value of d correct to one decimal place.
Which moving average best smooths the data?
$3$3-moving average
$5$5-moving average
Consider the time series data presented in the table.
Time Period | Raw Data | $4$4 Point Centred Moving Average | $6$6 Point Centred Moving Average |
---|---|---|---|
$1$1 | $52.9$52.9 | ||
$2$2 | $44.2$44.2 | ||
$3$3 | $62.6$62.6 | $53.11$53.11 | |
$4$4 | $51.3$51.3 | $a$a | $b$b |
$5$5 | $55.8$55.8 | $54.39$54.39 | $55.08$55.08 |
$6$6 | $46.6$46.6 | $55.03$55.03 | $55.6$55.6 |
$7$7 | $65.1$65.1 | $55.78$55.78 | $55.03$55.03 |
$8$8 | $c$c | $56.35$56.35 | $55.69$55.69 |
$9$9 | $59.2$59.2 | $56.78$56.78 | $57.84$57.84 |
$10$10 | $47.8$47.8 | $57.93$57.93 | |
$11$11 | $67.3$67.3 | ||
$12$12 | $d$d |
Calculate the value of $a$a in the table. Round your answer to 2 decimal places.
Calculate the value of $b$b in the table, rounding your answer to 2 decimal places.
Solve for the value of $c$c in the table. Round your answer to 1 decimal place.
Solve for the value of $d$d in the table. Round your answer to 1 decimal place.
Spreadsheets are a powerful tool for calculating moving averages and graphing both the original and smoothed data sets.
Follow the instructions to calculate a 3MA for the data below using a spreadsheet and then use your spreadsheet to graph both the original data and the smoothed data.
Note: The instructions shown use a particular spreadsheet software, the notation and behaviour of your spreadsheet may differ slightly from what is presented if you are using a different program. If in doubt, ask your teacher!
Time | Cost |
---|---|
1 | 10 |
2 | 20 |
3 | 15 |
4 | 25 |
5 | 29 |
6 | 24 |
7 | 31 |
8 | 35 |
9 | 30 |
Step 1: Use the first row to enter titles for the columns - Time for column A, Cost for column B and 3MA for column C. Then enter the time and cost data in their appropriate columns.
Step 2: Enter the formula for the moving average into Column C.
=(B2+B3+B4)/3
Step 3: Fill down the formula the formula from cell C3 to cell C9
Step 4: Graph the data
These calculations can also be performed in the spreadsheet application of a CAS. Select your brand below to see instructions for calculating and graphing moving averages.
Casio Classpad
Coming soon
TI Nspire
Coming soon
Justin weighs himself every $4$4 months and plots his weight on the graph below.
Which moving average should Justin use if he wishes to smooth the data in order to make predictions and have the moving average values plotted at each month and not between months?
3MA
3CMA
4MA
4CMA
Justin enters the following formula into cell $D3$D3 in his spreadsheet in order to calculate a $4$4 point centred moving average. Is this correct?
Yes
No, it should be $=(B2+C2+D2+E2)/4$=(B2+C2+D2+E2)/4
No, it should be entered into cell $C3$C3
No, it must use $5$5 data points so it should be $=(B2/2+C2+D2+E2+F2/2)/4$=(B2/2+C2+D2+E2+F2/2)/4
Calculate the correct value for the 4CMA for cell $D3$D3.
Round your answer to one decimal place.