Bivariate data consists of pairs of elements, usually numbers, $(a,b)$(a,b). The first number in the pair could be time and the second number is something that varies with time.
For example, we might measure the temperature every hour over a $24$24 hour period. The set of data would be made up of $24$24 pairs of numbers with the hour number written first and the corresponding temperature written second in each pair.
The first number in the pair is often called the treatment variable and the second number is the response variable. These are also called the independent and dependent variables respectively.
In looking at a bivariate data set, we may be interested in the average value of the response variable or we might want to know the extent of the variation that occurs over the range of the treatment variable. We may also look for trends in the way the response varies over the values of the treatment variable.
A trend may lead to predictions about future values of the response variable, or it may prompt a researcher to look for or explain a possible underlying relationship between the variables.
Bivariate data that has time as the independent variable, is often called a time series. This term is used particularly when the period of time over which the experimental observations are made is long. Thus, data sets concerning aspects of climate or concerning markets varying over time are referred to as time series.
The following column graph shows monthly average temperatures at Vladivostok.
The columns have been labeled with the names of the months but they could just as well have been numbered $1$1 to $12$12. Then, the data points would have been the pairs $(1,-12)$(1,−12), $(2,-9)$(2,−9), $(3,-2)$(3,−2), and so on.
If the temperature measurements displayed in this graph represented the measurements taken over just one year, say the year $2017$2017, then we would be unwise to make a prediction about what the temperature would be in April $2018$2018. However, if we knew that the graph represented average temperatures derived from measurements recorded over many years, then we would be much more confident about predicting how warm it will be in some future month.
From the chart, it can be seen that the total variation in average temperature over the year is about $33^\circ C$33°C, from $-12^\circ C$−12°C to $21^\circ C$21°C. The chart does not show by how much the temperature can vary within a month or a day. The averaging process thus removes a certain level of detail from the results.
Averaging can also remove what are called outliers. These are individual measurements that are very different from measurements taken just before or after. A single exceptionally hot day in Vladivostok on which the temperature reached $30^\circ C$30°C would show up as a spike in a graph of daily temperatures but would hardly affect the temperatures averaged over the month.
Statisticians recommend investigating the reasons for outliers when they occur in a data set. Sometimes they are just chance fluctuations or errors in measurement and they can be ignored, but they can also indicate a real effect that needs to be explained.
We should be careful to avoid extrapolating results beyond those that can be reasonably justified from the data.
The following dot-plot shows the first 100 days of a time series.
We might observe that the trend is generally increasing. We can also see that there is an increasing amount of fluctuation of the response values within the general trend, and perhaps the overall rate of increase is beginning to level out as time passes.
In the absence of extra information about the nature of the process being observed, it would be unwise to make a prediction about how the time series would unfold over the next $100$100 days. The upward trend might or might not continue, and the fluctuations could increase in intensity or reduce. Anything could happen!
The following diagram shows a possible continuation up to the $200$200-day mark.
The following graph shows the number of US dollars that one Australian dollar could buy over a 5 year period.
Source: XE.com
Describe the trend of the data:
Increasing
Decreasing
The graph shows energy usage in two countries over several years. Determine whether each of the following statements are true or false.
From $2004$2004 to $2012$2012, the United Kingdom's usage decreased by approximately as much as Australia's usage increased.
False
True
The graph does not give us any information about the types of energy being used.
False
True
The graph does not give us any information about energy use per capita.
False
True
From the beginning of 2012, the number of new houses built in the suburb of Woodford was recorded and figures are released every four months.
The following table contains the data from the beginning of 2012 to the end of 2015:
Time Period |
Houses Built |
April 2012 | $103$103 |
August 2012 | $92$92 |
December 2012 | $105$105 |
April 2013 | $99$99 |
August 2013 | $88$88 |
December 2013 | $104$104 |
April 2014 | $93$93 |
August 2014 | $85$85 |
December 2014 | $103$103 |
April 2015 | $93$93 |
August 2015 | $83$83 |
December 2015 | $96$96 |
Plot the number of new houses built in the suburb of Woodford for every $4$4 months that the data was released.
Sketch segments through each point to represent the overall time series.
What is the overall trend?
Downwards only
Upwards only
Seasonal and Upwards
Seasonal and Downwards
Plan and conduct investigations using the statistical enquiry cycle: A justifying the variables and measures used B managing sources of variation, including through the use of random sampling C identifying and communicating features in context (trends, relationships between variables, and differences within and between distributions), using multiple displays D making informal inferences about populations from sample data E justifying findings, using displays and measures.
Investigate a given multivariate data set using the statistical enquiry cycle
Investigate bivariate numerical data using the statistical enquiry cycle