topic badge

10.01 Random variables

Lesson

Random variables

In the merging of statistics and probability we begin to examine theoretical and experimental situations where we observe the outcomes of random phenomena.

A random variable, is a variable whose possible values are numerical outcomes of a random phenomenon. These observations are quantities that are either discrete or continuous. This chapter will focus on discrete random variables.

Let's recall here the difference between categorical, discrete and continuous data types, before we continue.

  • Categorical - data is non-numerical. In other words, it describes the qualities or characteristics of a data set. Categorical data is also known as qualitative.
  • Discrete - data is drawn from isolated values that are separated from one another. For each pair of neighbouring values in the data set there is no value in between. The data is often collected by counting.
  • Continuous - data is drawn from an unbroken interval of possible values. Continuous data is almost always measured.

 

Defining a discrete random variable

  1. Firstly, the outcomes of the situation or the experiment must take discrete values.
    • Remember that discrete data is numerical and countable.
  2. The outcomes must occur at random.
    • In an experiment or situation all the outcomes must occur randomly. 
  3. The outcomes must vary.
    • There needs to be more than one outcome and thus the outcomes vary. Consider the following examples:
      • Tossing two coins and counting the number of heads. The possible outcomes are $0$0$1$1 and $2$2.
      • Rolling two dice and finding the sum of the topmost faces. The possible outcomes are $2$2$3$3,..., $12$12.
      • The number of days that reach $20^\circ$20°C during March next year. The possible outcomes are $0$0 through to $31$31.

When thinking about what a discrete random variable (or DRV for short) actually is, the name itself tells you the three properties it has.

 

A cat with three kittens

Let's say there's a cat who's about to give birth to three kittens. Before they are born we know each will either be male or female. The exact combination of male and female kittens is unknown before they are born. We can instead consider all possible outcomes of the birth. An easy way to do this is to represent the possibilities with the following tree diagram.

What we've done so far is create a simple sample space, something we have done many times before. What we need to do now though, is choose something to focus on in this situation. There are two obvious things to focus on in this situation: either we focus on the number of female kittens or the number of male kittens.

Let's focus on the number of female kittens. 

How many female kittens will we see once all the kittens are born? Either $0,1,2$0,1,2 or $3$3. These are countable and able to be written in order.

So the number of female kittens will vary and will occur at random.

When a quantity varies, we can define a variable. In this case let's define $X$X as the number of female kittens born.

$x$x will have values of $0,1,2$0,1,2 and $3$3, where $x$x represents the possible outcomes for event $X$X occurring.

Each of these values for $X$X have a particular chance or probability of occurring. We can use our tree diagram and a table to summarise these probabilities.

$x$x $0$0 $1$1 $2$2 $3$3
$P(X=x)$P(X=x) $\frac{1}{8}$18 $\frac{3}{8}$38 $\frac{3}{8}$38 $\frac{1}{8}$18

What we've created here is a discrete probability distribution and represented it with an individual probability table.

$x$x represents the individual outcomes of event $X$X occurring.

$P(X=x)$P(X=x) represents the probability that outcome $x$x occurs for random variable $X$X. Put more simply, it's the probability of each of the outcomes occurring.

We could also represent this information with a cumulative probability table. That is, $P\left(X\le x\right)$P(Xx) represents the probability of the outcome being less than or equal to $x$x.

$x$x $0$0 $1$1 $2$2 $3$3
$P\left(X\le x\right)$P(Xx) $\frac{1}{8}$18 $\frac{4}{8}$48 $\frac{7}{8}$78 $\frac{8}{8}$88

 

Practice questions

Question 1

The weights of babies born in a local hospital in the last month have been recorded. One midwife is interested in the probability that of the next $5$5 babies born, the number of babies that would weigh more than $2.4$2.4 kg.

  1. Can this situation be modelled by a discrete random variable?

    Yes

    A

    No

    B
  2. If $Y$Y represents the number of babies in the next $5$5 babies born that weigh more than $2.4$2.4 kg, list all the possible outcomes.

    Write all the outcomes on the same line, separated by commas.

Question 2

A multiple choice test contains $10$10 questions, each with subparts (a) and (b). The answer to each subpart is awarded a half mark if correct, and zero if incorrect. If a student randomly answers each question, can the number of marks gained on this test be modelled by a discrete random variable?

  1. Yes

    A

    No

    B

Question 3

The quality control manager of the installation of a fibre-optic network is monitoring the faults found in the cable being used.

  1. Can the metres between successive faults in the fibre optic cable being analysed be modelled by a discrete random variable?

    Yes

    A

    No

    B
  2. What is the reason why this can not be represented by a discrete random variable?

    The possible outcomes are continuous, and therefore not discrete.

    A

    The possible outcomes are categorical, and therefore it's not a random variable.

    B
  3. Can the number of faults found in a randomly chosen $100$100 m length of the fibre optic cable be modelled by a discrete random variable?

    Yes

    A

    No

    B

Outcomes

2.6.1.1

understand the concepts of a discrete random variable and its associated probability function, and its use in modelling data

2.6.1.2

use relative frequencies obtained from data to determine point estimates of probabilities associated with a discrete random variable

What is Mathspace

About Mathspace