topic badge
CanadaON
Grade 9

7.03 Classify, organize and display data

Lesson

Types of data

When we collect information for statistical purposes we refer to that information as data.  Data can be classified as numerical data or categorical data.

 

Numerical data

Numerical data can be counted, ordered and measured. It can be either continuous or discrete. Numerical data is also called quantitative data.

 

Continuous numerical data

A data set is continuous if the values can take on any value within a finite or infinite interval.

Examples of continuous data are height, weight, temperature or the time taken to run $100$100 metres.

Data for all of these examples could be anywhere on a scale interval and could even be fractions. For example, it might be $25.3$25.3 degrees or a man might be $182.13$182.13cm tall.

Notice that each of these examples is measured with some sort of instrument: a ruler, a set of scales, a thermometer, a stopwatch. Continuous data is almost always measured.

 

Discrete numerical data

A data set is discrete if the numerical values can be counted but are distinct and separate from each other. They are often (but not always) whole number values.

Examples of discrete data are the number of pets people have and the number of goals scored in a game and money.

Data for these examples will always have distinct values. We couldn't own $\frac{1}{4}$14 of a dog score or score $2.5$2.5 goals in a game of soccer so there is no continuity between the scores.

In some soccer tournaments, half a point is awarded for a draw. In this case, there could be a score of $2.5$2.5, but there still could not be a score of $2.25$2.25 or $2.75$2.75 so the data is still discrete.

Other examples of discrete numerical data include marks on a test, views on a video, votes for a candidate in an election, and how much money you have. Notice that each of these examples of discrete data is counted, not measured. Discrete numerical data is always countable.

 

Categorical data

Categorical data is non-numeric and is represented by words. It describes the qualities or characteristics of a data set. Categorical data is also known as qualitative data.

Examples include blood groups (A, B, AB or O) or hotel star ratings.

Categories may have numeric labels, such as the numbers worn by players in a sporting team, but these labels have no numerical significance; they merely serve as labels.

Categorical data can be either ordinal or nominal.

 

Ordinal categorical data

A set of data is ordinal if the values can be counted and ordered but not measured.

Rating scales are examples of ordinal data. The finishing places in a race are another example of ordinal data. Finishing first means you were faster than the person who came second and the person who finished eighth was slower than the person who finished sixth. So the finishing places can be ordered but the differences between the finishing times may not be the same between all competitors.

 

Nominal categorical data

For nominal data, the data is split up based on different names or characteristics. Nominal data may be the names of countries you have visited or your favourite colours. We could assign these different characteristics a number where the numbers are labels. In other words, you are giving categorical data numerical labels. You can count but not order or measure nominal data.

 

Types of data
  • Categorical - represented by words
    • Ordinal - has an implicit order (such as subject grades A, B, C, D)
    • Nominal - identified by name (such as breeds of dog)
  • Numerical - associated with a number value.
    • Discrete - can only take distinct values (such as the number of goals). Usually obtained by counting.
    • Continuous - can take on any value (such as temperature). Usually obtained by measuring.

Practice questions

Question 1

Which of the following are examples of numerical data? (Select all that apply)

  1. favourite flavours

    A

    maximum temperature

    B

    daily temperature

    C

    types of horses

    D

Question 2

Classify this data into its correct category:

Weights of dogs

  1. Categorical Nominal

    A

    Categorical Ordinal

    B

    Numerical Discrete

    C

    Numerical Continuous

    D

 

Organizing data

This is usually accomplished by organizing the data in tables, including frequency tables. Continuous numerical data is usually best organized in grouped frequency tables.

Frequency and grouped frequency tables

Frequency tables are the best choice to organize categorical data and discrete numerical data when there is a small number of possible values.

Grouped frequency tables are best for continuous numerical data and discrete numerical data when the data can take a large number of possible values. The frequency recorded for a group is the sum of the frequencies for all data values contained in the group.

The tables below show examples of a frequency table used for categorical data, and a grouped frequency table used for continuous numerical data.

Frequency table

Colour of cars in the school carpark
Colour Frequency
white $14$14
red $2$2
blue $3$3
black $1$1
yellow $1$1

Grouped frequency table

Height of year 9 students
Height (cm) Frequency
$145-150$145150 $3$3
$150-155$150155 $10$10
$155-160$155160 $8$8
$160-165$160165 $13$13
$165-170$165170 $1$1

 

Grouped data

When we group data, we create class intervals, which tell us the range of scores in a particular group. Class intervals should all be equal size, and there should not be gaps between intervals.

For example, if our class interval is $1-5$15, we know that this class contains any values from $1$1 to $5$5, inclusive.  If the class interval is expressed as $1-<5$1<5, it includes any score that is greater than or equal to $1$1 and less than $5$5.

To help make it easier to work with our data, we usually find the class centre which is taken as the representative value of the class interval when we analyze the data. The class centre is the middle score of each class interval. For the interval $1-5$15, the class centre would be $\frac{1+5}{2}=3$1+52=3.

Selecting the interval width is important. If the intervals are too narrow there will be many gaps so the shape of the distribution will not be visible. If the intervals are too wide the shape of distribution will not be apparent. As a guide, $6$6 to $12$12 intervals will typically be most useful for moderate size data sets.

 

Practice questions

Question 3

Find the class centre for the class interval $17$17-$22$22.

Question 4

What would be the most appropriate way of representing data from:

  1. A survey conducted of $1000$1000 people, asking them how many languages they speak?

    Leaving the data ungrouped and constructing a frequency table

    A

    Grouping the responses and constructing a frequency table

    B
  2. A survey conducted of $1000$1000 people, asking them how many different countries they know the names of?

    Grouping the responses and constructing a frequency table

    A

    Leaving the data ungrouped and constructing a frequency table

    B

Question 5

As part of a fuel watch initiative, the price of gas at a service station was recorded each day for $21$21 days. The frequency table shows the findings.

Price (in cents per litre) Class Centre Frequency
$130.9$130.9-$135.9$135.9 $133.4$133.4 $6$6
$135.9$135.9-$140.9$140.9 $138.4$138.4 $5$5
$140.9$140.9-$145.9$145.9 $143.4$143.4 $5$5
$145.9$145.9-$150.9$150.9 $148.4$148.4 $5$5
  1. What was the highest price that could have been recorded?

  2. How many days was the price above $140.9$140.9 cents?

 

Displaying data

Once we have organized the data, we need to present the data in a form that will be easy to read, understand and analyze.

Displaying data

Some common ways of displaying statistical data are listed below.

  • histograms
  • bar graphs
  • line plots
  • stem and leaf plots

The best type of display to be used will depend on the type of data and purpose of the investigation.

Another type of statistical graph, the box and whisker plot is used to display statistical summary data, and will be described in a later section.

 

Histograms and bar graphs

These graphs represent the frequency of data values as the length of horizontal bars or vertical columns.

Bar graphs are usually used to display categorical data.

Histograms are similar to bar graphs, with vertical columns used to display numerical data. The main difference between a bar graph and histogram is that histograms do not have spaces between the columns.

The reason that histograms do not have gaps between columns is that the class intervals are not separate categories. Instead, the columns represent the frequency of values observed in the class intervals. The width of the columns indicates the range of values in the class intervals.

Example 1

Each student in a class was surveyed and asked about the colour of their eyes. The data is categorical and the results are displayed in a bar graph (left) and horizontal bar graph (right) below:

Example 2

Each student in a class was surveyed and asked the size of their families. The data is numerical and the results are displayed in a histogram below:

The data that was collected in this survey is discrete data because it can take particular values (in this case whole numbers). In histograms that display discrete data the mark is located in the centre of the columns across the horizontal axis. The height of each column represents the frequency of each data item.

 

Example 3

Each student in a class was surveyed and asked their heights. The data is numerical and the results are displayed in a histogram below:

The data that was collected in this survey is continuous data because it can take any value within a range. In histograms that display continuous data, the column width represents the range of each interval or bin. The height of each column represents the frequency of each data item within each interval.

 

Practice questions

Question 6

Continuous data is represented in a histogram as shown:

HistogramScoreFrequency1020212325272931

  1. Complete the following frequency table:

    Score Frequency
    $21$21 $\editable{}$
    $23$23 $\editable{}$
    $25$25 $\editable{}$
    $27$27 $\editable{}$
    $29$29 $\editable{}$
    $31$31 $\editable{}$

Question 7

In product testing, the number of faults detected in producing a certain machinery is recorded each day for several days. The frequency table shows the results.

Number of faults Frequency
$0-3$03 $10$10
$4-7$47 $14$14
$8-11$811 $20$20
$12-15$1215 $16$16
  1. Construct a histogram to represent the data.

    Faulty MachineryNumber of FaultsFrequency10201.55.59.513.5

  2. What is the lowest possible number of faults that could have been recorded on any particular day?

    $\editable{}$ faults

 

Line plots

Line plots are a graphical way of displaying the distribution of numerical or categorical data on a simple scale with dots representing the frequency of data values. These are often also called line plots. They are best used for small to medium size sets of data and are good for visually highlighting how the data is spread and whether there are any gaps in the data or outliers. We will look at identifying outliers in more detail in our next lesson.

In a line plot, each individual value is represented by a single dot, displayed above a horizontal line. When data values are identical, the dots are stacked vertically. The graph appears similar to a pictograph or bar graph with the number of dots representing the total count.

  • To correctly display the distribution of the data, the dots must be evenly spaced in columns above the line
  • The scale or categories on the horizontal line should be evenly spaced
  • A line plot does not have a vertical axis
  • The line plot should be appropriately labelled

 

Practice questions

Question 8

Here is a dot plot of the number of goals scored in each of Bob’s soccer games.

  1. How many times were five goals scored?

  2. Which number of goals were scored equally and most often?

    $1$1

    A

    $0$0

    B

    $4$4

    C

    $3$3

    D

    $2$2

    E

    $5$5

    F
  3. How many games were played in total?

Question 9

The goals scored by a football team in their matches are represented in the following dot plot.

  1. Complete the following frequency distribution table.

    Goals scored Frequency
    $0$0 $\editable{}$
    $1$1 $\editable{}$
    $2$2 $\editable{}$
    $3$3 $\editable{}$
    $4$4 $\editable{}$
    $5$5 $\editable{}$

 

Stem and leaf plot

A stem and leaf plot, or stem plot, is used for organizing and displaying numerical data. It is appropriate for small to moderately sized data sets. The graph is similar to a bar graph on its side. An advantage of a stem and leaf plot over a bar graph is the individual scores are retained and further calculations can be made accurately.

In a stem and leaf plot, the right-most digit in each data value is split from the other digits, to become the 'leaf'. The remaining digits become the 'stem'.

The values in a stem and leaf plot should be arranged in ascending order (from lowest to highest) from the centre out. To emphasise this, it is often called an ordered stem and leaf plot.

The data values $10,13,16,21,26,27,28,35,35,36,41,41,45,46,49,50,53,56,58$10,13,16,21,26,27,28,35,35,36,41,41,45,46,49,50,53,56,58 are displayed in the stem and leaf plot below.

  • The stems are arranged in ascending order, to form a column, with the lowest value at the top
  • The leaf values are arranged in ascending order from the stem out, in rows, next to their corresponding stem
  • A single vertical line separates the stem and leaf values
  • There are no commas or other symbols between the leaves, only a space between them
  • In order to correctly display the distribution of the data, the leaves must line up in imaginary columns, with each data value directly below the one above
  • A stem and leaf plot includes a key that describes the way in which the stem and the leaf combine to form the data value

 

Practice questions

Question 10

Which of the following is true of a stem-and-leaf plot?

Stem Leaf
$0$0 $7$7
$1$1  
$2$2  
$3$3 $1$1 $3$3 $3$3 $3$3
$4$4 $1$1 $2$2 $3$3 $4$4 $9$9
$5$5 $1$1 $2$2 $4$4 $5$5 $5$5
$6$6 $0$0
 
Key: $1$1$\mid$$2$2$=$=$12$12
  1. The scores are ordered.

    A

    A stem-and-leaf plot does not give an idea of outliers and clusters.

    B

    It is only appropriate for data where scores have high frequencies.

    C

    The individual scores cannot be read on a stem-and-leaf plot.

    D

Question 11

The stem-and-leaf plot below shows the age of people to enter through the gates of a concert in the first $5$5 seconds.

Stem Leaf
$1$1 $1$1 $2$2 $4$4 $5$5 $6$6 $6$6 $7$7 $9$9 $9$9
$2$2 $2$2 $3$3 $5$5 $5$5 $7$7
$3$3 $1$1 $3$3 $8$8 $9$9
$4$4  
$5$5 $8$8
 
Key: $1$1$\mid$$2$2$=$=$12$12
years old
  1. How many people passed through the gates in the first $5$5 seconds?

  2. What was the age of the youngest person?

    The youngest person was $\editable{}$ years old.

  3. What was the age of the oldest person?

    The oldest person was $\editable{}$ years old.

  4. What proportion of the concert-goers were under $20$20 years old?

Outcomes

9.D1.2

Represent and statistically analyse data from a real-life situation involving a single variable in various ways, including the use of quartile values and box plots.

What is Mathspace

About Mathspace