A random variable is, quite simply, the result of running a random experiment on a population. A random variable will be able to take certain numerical values, which is another way of saying that there are multiple outcomes of an experiment. The random variable may be discrete or continuous. The set of all of the outcomes of the random variable and their associated probabilities is a probability distribution.
Before we get into defining or explaining each of these terms, we'll start with an example which will make this a little more clear.
Let's say there's a cat who's about to give birth to three kittens. Before they are born we know each will either be male or female. The exact combination of male and female kittens is unknown before they are born. We can instead consider all possible outcomes of the birth. An easy way to do this is to represent the possibilities with the following tree diagram.
What we've done so far is create a simple sample space. You've done this many times before. Now that we have our sample space, we can choose something to focus on in this situation. There are only two things (variables) to focus on in this situation: either we focus on the number of female kittens or the number of male kittens.
Let's focus on the number of female kittens. How many female kittens will we see once all the kittens are born? Either $0$0, $1$1, $2$2 or $3$3.
We know that the number of female kittens will vary and will occur at random. When a quantity varies, we can define a variable. In this case let's define $X$X as the number of female kittens born. $x$x will have values of $0$0, $1$1, $2$2 and $3$3, where $x$x represents the possible outcomes for event $X$X occurring.
Each of these values for $X$X have a particular chance or probability of occurring. We can use our tree diagram and a table to summarise these probabilities.
$x$x | $0$0 | $1$1 | $2$2 | $3$3 |
---|---|---|---|---|
$P\left(X=x\right)$P(X=x) | $\frac{1}{8}$18 | $\frac{3}{8}$38 | $\frac{3}{8}$38 | $\frac{1}{8}$18 |
What we've created here is a discrete probability distribution - a concise summary of all of the possible outcomes of the experiment and their associated probabilities. We have represented it in a table, but we could also choose to display it graphically.
$x$x represents the individual outcomes of event $X$X occurring. $P\left(X=x\right)$P(X=x) represents the probability that outcome $x$x occurs for random variable $X$X. Put more simply, it's the probability of each of the outcomes occurring.
A random variable has a pretty straight forward and self contained definition - the outcomes must occur at random and they must vary (which means there must be more than one possible outcome). In our example, we didn't know how many female kittens would be born and it happens at random and there were four possible numbers of kittens that could be born - $0$0, $1$1, $2$2 or $3$3.
To be a discrete random variable, the outcomes of the situation or the experiment must take discrete values. Remember that discrete data is numerical and takes on integer values. You'll know you're talking about a discrete random variable when you have counted the number of possible outcomes. In our example, we counted that there could be either $0$0, $1$1, $2$2 or $3$3 female kittens - we couldn't have had half of a kitten!
Continuous data is also numerical, but it is data that has been measured. For example, if you were to measure the heights of students in a particular grade, the possible outcomes would exist over a range of heights. Let's say their heights were between $155$155 cm and $175$175 cm. This unit of length is a unit in space. It doesn't need to be an integer, and could be measured to any degree of accuracy we chose. This is an example of a continuous random variable.
Remember that what you're asking yourself here is whether the outcomes are counted or measured.
Consider rolling a single dice. We can define our discrete random variable $X$X as the value shown on the top side. We know that each of these outcomes are equally likely, and the probabilities that would be shown in our discrete probability distribution would all be the same. This is what we call a uniform random variable.
$x$x |
$1$1 |
$2$2 |
$3$3 |
$4$4 |
$5$5 |
$6$6 |
$P\left(X=x\right)$P(X=x) | $\frac{1}{6}$16 | $\frac{1}{6}$16 | $\frac{1}{6}$16 | $\frac{1}{6}$16 | $\frac{1}{6}$16 | $\frac{1}{6}$16 |
Now, let's think about a maths class and their results on their latest pop quiz. Here, we can define the marks for the quiz as our discrete random variable $X$X. In a class of ten students, two students scored $6$6, three scored $7$7, one scored $8$8 and four scored $9$9. It is clear that these outcomes are not equally likely, and so, like many discrete random variables, this is a non-uniform random variable.
$x$x | $6$6 | $7$7 | $8$8 | $9$9 |
$P\left(X=x\right)$P(X=x) | $\frac{2}{10}$210 | $\frac{3}{10}$310 | $\frac{1}{10}$110 | $\frac{4}{10}$410 |
This second example also allows us to see how a discrete probability distribution can be constructed using the relative frequencies of events as point estimates for the probability of the discrete random variable.
Other than using a table, it is possible to show a discrete probability distribution on a graph. Below are two examples showing a discrete probability distribution. Notice that the values on the $y$y-axis lie between $0$0 and $1$1, as this is the range of values that the probability of each outcome can take.
Regardless of whether the discrete random variable is uniform or non-uniform, our probability distribution must always have its sum of probabilities equal to $1$1. The symbol $\Sigma$Σ, which is the Greek letter Sigma, is used mathematically to represent a 'sum' and so you will see this property of probability distributions written as $\Sigma P\left(x\right)=1$ΣP(x)=1. We also only display numerical data, and (quite obviously) the probability of each possible outcome must lie in the range $0\le P\left(x\right)\le1$0≤P(x)≤1.
On average, the number of green snakes in each packet of snakes sold is $5$5.
Can this data be represented by a discrete random variable?
Yes
No
Is the following a probability distribution?
$x$x | $2$2 | $4$4 | $6$6 | $8$8 |
---|---|---|---|---|
$p\left(x\right)$p(x) | $0.2$0.2 | $0.4$0.4 | $0.6$0.6 | $0.8$0.8 |
No
Yes
Danielle records her team's winning or losing margins over $10$10 games of the hockey season, with winning margins recorded as positive values and losing margins as negative values. The margins were recorded below.
$X$X | $-1$−1 | $4$4 | $2$2 | $3$3 | $1$1 | $2$2 | $4$4 | $-1$−1 | $2$2 | $1$1 |
---|
Let $X$X be the margin of a given game. Summarise this data in a frequency table.
$X$X | Frequency |
---|---|
$-1$−1 | $\editable{}$ |
$1$1 | $\editable{}$ |
$2$2 | $\editable{}$ |
$3$3 | $\editable{}$ |
$4$4 | $\editable{}$ |
Hence, complete this table for the discrete probability distribution for $X$X.
$x$x | $-1$−1 | $1$1 | $2$2 | $3$3 | $4$4 |
---|---|---|---|---|---|
$P\left(X=x\right)$P(X=x) | $\editable{}$ | $\editable{}$ | $\editable{}$ | $\editable{}$ | $\editable{}$ |
A six-sided die with numbers from $1$1 to $6$6 is weighted such that $P\left(\text{each prime number }\right)=0.1$P(each prime number )=0.1 and $P\left(4\right)$P(4)$=$=$P\left(6\right)$P(6)$=$=$0.3$0.3.
Let $X$X represent the possible outcomes from one roll of the dice.
Complete the table representing the probability distribution of $X$X below.
Enter the values of $x$x from left to right in ascending order.
$x$x | $1$1 | $2$2 | $3$3 | $4$4 | $5$5 | $6$6 |
---|---|---|---|---|---|---|
$P$P$($($X=x$X=x$)$) | $0.1$0.1 | $\editable{}$ | $\editable{}$ | $\editable{}$ | $\editable{}$ | $\editable{}$ |
Calculate $P$P$($($X<3$X<3$)$).
Calculate $P$P$($($X=3$X=3$|$|$X\le5$X≤5$)$).
Calculate $P$P$($($X<3$X<3$|$|$X<5$X<5$)$).
Calculate $P$P$($($X<4$X<4$|$|$X\ge2$X≥2$)$).