10. Representing Data

Lesson

In statistics, a '**variable**' refers to a source (of data) that is measurable or observable. A variable could be something like temperature, mass, height, make of car, type of animal or goals scored. In most cases, we would expect a variable to change between each observation.

Data variables can be defined as either **numerical** or **categorical**.

- Numerical data is where each data point is represented by a
**number**. Examples include: number of items sold each month, daily temperatures, heights of people, and ages of a population. The data can be further defined as either**discrete**(associated with counting) or**continuous**(associated with measuring). Numerical data is also known as**quantitative**data.

- Categorical data is where each data point is represented by a
**word**or label. Examples include: brand names, types of animals, favourite colours, and names of countries. The data can be further defined as either**ordinal**(it can be ordered) or**nominal**(un-ordered). Categorical data is also known as**qualitative**data.

Discrete numerical data involve data points that are distinct and separate from each other. There is a definite 'gap' separating one data point from the next. Discrete data usually, but not always, consists of whole numbers, and is often collected by some form of **counting**.

Examples of discrete data:

Number of goals scored per match | $1$1, $3$3, $0$0, $1$1, $2$2, $0$0, $2$2, $4$4, $2$2, $0$0, $1$1, $1$1, $2$2, ... |

Number of children per family | $2$2, $3$3, $1$1, $0$0, $1$1, $4$4, $2$2, $2$2, $0$0, $1$1, $1$1, $5$5, $3$3, ... |

Number of products sold each day | $437$437, $410$410, $386$386, $411$411, $401$401, $397$397, $422$422, ... |

In each of these cases, there are no in-between values. We cannot have $2.5$2.5 goals or $1.2$1.2 people, for example.

This doesn't mean that discrete data always consists of whole numbers. Shoe sizes, an example of discrete data, are often separated by half-sizes. For example, $8$8, $8.5$8.5, $9$9, $9.5$9.5. Even still, there is a definite gap between the sizes. A shoe won't ever come in size $8.145$8.145.

Continuous numerical data involves data points that can occur anywhere along a continuum. Any value is possible within a range of values. Continuous data usually consists of decimal numbers, and is often collected using some form of **measurement**.

Examples of continuous data:

Height of trees in a forest (in metres) | $12.359$12.359, $14.022$14.022, $14.951$14.951, $18.276$18.276, $11.032$11.032, ... |

Times taken to run a $10$10 km race (minutes) | $55.34$55.34, $58.03$58.03, $57.25$57.25, $61.49$61.49, $66.11$66.11, $59.87$59.87, ... |

Daily temperature (degrees C) | $24.4$24.4, $23.0$23.0, $22.5$22.5, $21.6$21.6, $20.7$20.7, $20.2$20.2, $19.7$19.7... |

In practice, continuous data will always be subject to the accuracy of the measuring device being used.

The word 'ordinal' basically means 'ordered'. Ordinal categorical data involves data points, consisting of words or labels, that can be ordered or ranked in some way.

Examples of ordinal data:

Product rating on a survey | good, satisfactory, good, excellent, excellent, good, good, ... |

Exam grades | A, C, A, B, B, C, A, B, A, A, C, B, A, B, B, B, C, A, C, ... |

Size of fish in a lake | medium, small, small, medium, small, large, medium, large, ... |

Nominal categorical data

The word 'nominal' basically means 'name'. Nominal categorical data consists of words or labels, that name individual data points.

Examples of nominal data:

Nationalities in a sporting team | German, Austrian, Italian, Spanish, Dutch, Italian, ... |

Make of car driving through an intersection | Toyota, Holden, Mazda, Toyota, Ford, Toyota, Mazda, ... |

Hair colour of students in a class | blonde, red, brown, blonde, black, brown, black, red, ... |

Nominal data is often described as 'un-ordered' because it can't be ordered in a way that is statistically meaningful.

Which two of the following are examples of numerical data?

favourite flavours

Amaximum temperature

Bdaily temperature

Ctypes of horses

D

Which one of the following data types is discrete?

The number of classrooms in your school

ADaily humidity

BThe ages of a group of people

CThe time taken to run $200$200 metres

D

Classify this data into its correct category:

Weights of dogs

Categorical Nominal

ACategorical Ordinal

BNumerical Discrete

CNumerical Continuous

D