topic badge

8.01 Probability density functions

Lesson

Probability distributions

A random variable results from running an experiment on a population. A random variable takes certain numerical values equivalent to the outcomes of an experiment. The random variable may be discrete or continuous. The set of all outcomes, (the values that the random variable can take), and the associated probabilities are called a probability distribution.

For example, the probability distribution for rolling a die is shown in the table below where $X$X is the random variable, and $x$x represents a specific outcome:

$x$x $1$1 $2$2 $3$3 $4$4 $5$5 $6$6
$P\left(X=x\right)$P(X=x) $\frac{1}{6}$16 $\frac{1}{6}$16 $\frac{1}{6}$16 $\frac{1}{6}$16 $\frac{1}{6}$16 $\frac{1}{6}$16

The outcomes for a uniform random variable are all equally likely, for example, all the outcomes from rolling a die have the same probability of $\frac{1}{6}$16.

Discrete probability distributions describe random variables that have discrete values. For example, when rolling a dice, the possible outcomes are $\left\{1,2,3,4,5,6\right\}${1,2,3,4,5,6}. Continuous probability distributions involve probabilities of random variables that can take on any value within a given numerical range, for example, the height of individuals, or the amount of time you wait for a train. Often the values are measured along a scale.

Estimating probabilities from relative frequency and histograms

Previously, we often organised and displayed continuous distributions by grouping them in class intervals and determining the frequency of each class. Using this data we were able to determine the relative frequency, that is, the estimated probability of a score being in a particular range. The sum of all the relative frequencies is $1$1, as this is the sum of all the probabilities.

We can draw a histogram and polygon of the relative frequency in the same way we draw a frequency histogram. The relative frequency histogram has a significant property in that the sum of the areas of the rectangular columns is $1$1 if the width of each column is $1$1. We can estimate the probability of a range of scores using the histogram.

 

Worked example

Example 1

The table below gives the results of different times that runners took to complete a race.

Class interval Class Centre Frequency
$45-<50$45<50 $47.5$47.5 $9$9
$50-<55$50<55 $52.5$52.5 $7$7
$55-<60$55<60 $57.5$57.5 $20$20
$60-<65$60<65 $62.5$62.5 $30$30
$65-<70$65<70 $67.5$67.5 $6$6

(a) Add a column for the relative frequency

Think: The relative frequency of a score is $\frac{\text{frequency of each class}}{\text{total frequency}}$frequency of each classtotal frequency

Class interval Frequency Relative Frequency
$45-<50$45<50 $9$9 $\frac{9}{72}=\frac{1}{8}$972=18
$50-<55$50<55 $7$7 $\frac{7}{72}$772
$55-<60$55<60 $20$20 $\frac{20}{72}=\frac{5}{18}$2072=518
$60-<65$60<65 $30$30 $\frac{30}{72}=\frac{5}{12}$3072=512
$65-<70$65<70 $6$6 $\frac{6}{72}=\frac{1}{12}$672=112

(b) Sketch a frequency histogram for the relative frequencies.

(c) Estimate each probability.

(i) $P\left(X\le55\right)$P(X55)

Think: This is the probability of the probability being between $45$45 and $50$50 or between $50$50 and $55$55.

Do:

$P\left(X\le55\right)$P(X55) $=$= $\frac{9}{72}+\frac{7}{72}$972+772
  $=$= $\frac{16}{72}$1672
  $=$= $\frac{2}{9}$29

 

(ii) $P\left(X\ge60\right)$P(X60)

Think: This is the probability that the score is either between $60$60 and $65$65 or between $65$65 and $70$70.

Do:

$P\left(X\ge60\right)$P(X60) $=$= $\frac{5}{12}+\frac{1}{12}$512+112
  $=$= $\frac{6}{12}$612
  $=$= $\frac{1}{2}$12

 

(iii) $P\left(50\le X\le65\right)$P(50X65)

Think: This is the probability of the score being either between $50$50 and $55$55, $55$55 and $60$60, or $60$60 and $65$65. We can calculate this using complementary events.

Do:

$P(50\le X\le65)$P(50X65) $=$= $1-P\left(X\le50\right)-P\left(65\le X\le70\right)$1P(X50)P(65X70)
  $=$= $1-\frac{9}{72}-\frac{6}{72}$1972672
  $=$= $\frac{19}{24}$1924

 

Reflect: While we can estimate the probability using the relative frequency, we can use other methods to find more accurate ways of determining the probability of continuous data.

Probability density function (PDF)

Discrete probability functions are represented graphically by a probability mass function ($f\left(x\right)=P\left(X=x\right)$f(x)=P(X=x)). A probability mass function looks like a bar graph. The value of $f\left(x\right)$f(x) gives the probability of the random variable having the outcome $x$x. The sum of all of the probabilities must be equal to $1$1. The two graphs below are examples of probability mass functions.

A probability density function (PDF) represents continuous probability functions graphically. This function models the limiting case of histogram where the amount of data increases and the class interval size decreases.

For example, the histogram on the left below shows a random sample of weights of apples in an orchard. We could use the proportion of apples that lie in the interval $90$90 to $94$94 grams to estimate the probability of a randomly selected apple from the orchard lying in this range.

Frequency histogram
Frequency histogram with probability density function

We could take a larger sample and use smaller class intervals as shown in the histogram on the right. We could then imagine if the number of observations could be increased indefinitely while the width of the sub-range intervals is made very narrow that, in this case, the continuous shape shown would be formed. This imagined curve corresponds to what is called a probability density function.

The area above a particular interval and below the probability density curve corresponds to the probability that a future observation will fall within that interval. This is a similar idea to the way a histogram works.

Properties of probability density functions

A probability density function, $f\left(x\right)$f(x), must satisfy the following two properties:

  • $f\left(x\right)\ge0$f(x)0 for all $x$x (since probability values are positive)
  • $\int_{-\infty}^{+\infty}\ f\left(x\right)\ dx=1$+ f(x) dx=1 (because the sum of all the probabilities is $1$1).

Note: Often our probability function occurs between two specific values, on an interval $\left[a,b\right]$[a,b] and can be defined as $0$0 elsewhere, thus from the second property above we would have $\int_b^a\ f\left(x\right)\ dx=1$ab f(x) dx=1

 

 

Worked example

Example 2

A function is given by $f\left(x\right)=4x^2$f(x)=4x2 for $1\le x\le4$1x4. Is this a continuous probability distribution?

Think: The function is a continuous probability distribution if the integral of the function in the given domain is $1$1 and $f\left(x\right)\ge0$f(x)0 for all values in the given domain.

Do: When you graph $f\left(x\right)=4x^2$f(x)=4x2 you will find that when $x$x is between $1$1 and $4$4, the curve is above the $x$x-axis, therefore, $f\left(x\right)\ge0$f(x)0 over the domain.

To be a continuous probability function it must also satisfy $\int_a^b\ f\left(x\right)\ dx=1$ba f(x) dx=1. To check this we need to evaluate the integral of $f\left(x\right)$f(x) from $1$1 to $4$4:

The area is $84$84, which is greater than $1$1, and therefore this function is not a continuous probability distribution.

Uniform continuous probability distributions

Continuous probability distributions are uniform when all the probabilities are the same. The corresponding PDF will be a horizontal line on the number plane.

Uniform continuous probability distributions

For a uniform distribution that is in the domain $\left[a,b\right]$[a,b], the probability density function $f\left(x\right)$f(x) is in the form:

$f\left(x\right)=\frac{1}{b-a}$f(x)=1ba

A uniform probability density function

 

Worked example

Example 3

Is the function a probability density function?

Think: Remember a function is a PDF if the area under the function is $1$1 and if $f\left(x\right)\ge0$f(x)0 for all values of $x$x.

Do: From the graph we can see that the function is all above the x-axis which means that $f\left(x\right)\ge0$f(x)0. The area under the function is the area of a rectangle that measures $20$20 by $\frac{1}{20}$120. Hence area is:

$\text{Area}=20\times\frac{1}{20}=1$Area=20×120=1

Therefore the function is a probability density function.

Finding the probability for continuous probability distributions

If we wanted to find the probability that the outcome of a continuous random variable is a specific value, say $2$2, it would be $0$0 due to the properties of definite integrals. Instead, we can use a continuous probability distribution to find the probability that a random variable has an outcome within a particular interval.

To find the area we need to find the integral of $f\left(x\right)$f(x) between $a$a and $b$b where $f\left(x\right)$f(x) is the probability density function and $a$a and $b$b are outcomes of the random variable within the given domain. This is notated as:

$P\left(a\le X\le b\right)=\int_a^b\ f\left(x\right)\ dx$P(aXb)=ba f(x) dx

Note that because $P\left(X=a\right)=0$P(X=a)=0 and $P\left(X=b\right)=0$P(X=b)=0, then $P\left(aP(a<X<b)=P(aXb).

We can find the probability of an outcome, for a probability density function defined over an interval $\left[a,b\right]$[a,b], and up to a value $r$r where $a\le r\le b$arb.

To find the area we need to find the integral of $f\left(x\right)$f(x) between $a$a and $r$r where $f\left(x\right)$f(x) is the probability density function and $a$a and $b$b define the domain of the outcomes of the random variable. This is notated as:

$P\left(a\le r\right)=\int_a^r\ f\left(x\right)\ dx$P(ar)=ra f(x) dx

Worked examples

Example 4

For the probability density function, find:

(a) $P(0\le X\le2)$P(0X2)

Think: to find the area when $x$x is between $0$0 and $2$2, we shade the corresponding area under the function and find its area by finding the area of a rectangle.

Do: The rectangle has dimensions $2$2 by $\frac{1}{20}$120, therefore:

$\text{Area}=2\times1/20=0.1$Area=2×1/20=0.1

And:

$P(0\le X\le2)=0.1$P(0X2)=0.1

 

(b) $P(4\le X\le15)$P(4X15)

Do: The relevant area is shaded below:

The area is of a rectangle with dimensions $15-4$154 by $\frac{1}{20}$120 is given by:

$\text{Area}=(15-4)\times(1/20)=0.55$Area=(154)×(1/20)=0.55

Therefore:

$P\left(4\le X\le15\right)=0.55$P(4X15)=0.55

Example 5

Let $X$X be a continuous random variable whose probability density function is $f\left(x\right)=3x^2$f(x)=3x2, on the interval $\left[0,1\right]$[0,1]. What is $P\left(\frac{1}{2}\le X\le1\right)$P(12X1)?

Think: To find the probability we need to find the area under the curve which involves finding the integral of $f\left(x\right)$f(x) between $\frac{1}{2}$12 and $1$1.

Do:

Practice questions

Question 1

Consider the probability density function $p\left(x\right)$p(x) drawn below for a random variable $X$X.

Loading Graph...

  1. Calculate the area between $p(x)$p(x) and the $x$x axis, without using integration. Show your working.

  2. Which features of $p\left(x\right)$p(x) are also features of all continuous probability distribution functions? Select all options that apply.

    $p\left(x\right)$p(x) is zero on both ends of the distribution.

    A

    The area under $p\left(x\right)$p(x) is a triangle.

    B

    $p\left(x\right)$p(x) is $0$0 outside the region $0\le x\le5$0x5.

    C

    The area under $p\left(x\right)$p(x) is equal to $1$1.

    D
  3. Calculate $P$P$($($X$X$<$<$3$3$)$) using geometric reasoning.

  4. Calculate $P$P$($($X>3$X>3$\mid$$X\le4$X4$)$) using geometric reasoning.

Question 2

Consider the probability density function $p\left(x\right)$p(x) drawn below for a random variable $X$X.

Loading Graph...

  1. Calculate the area between $p(x)$p(x) and the $x$x axis.

  2. Which feature(s) of $p\left(x\right)$p(x) is also a feature of all probability distribution functions? Select all options that apply.

    $p\left(x\right)$p(x) is positive for all values of $x$x.

    A

    $p\left(x\right)$p(x) is defined in the region $-\infty<x<.

    B

    $p\left(x\right)$p(x) is only defined in the region $10\le x\le80$10x80.

    C

    The area under $p\left(x\right)$p(x) is equal to $1$1.

    D
  3. Calculate $P$P$($($X$X$\le$$54$54$)$) using geometric reasoning.

  4. Calculate $P$P$($($X$X$>$>$34$34$)$) using geometric reasoning.

  5. Calculate $P$P$($($44$44$<$<$X$X$\le$$53$53$)$) using geometric reasoning.

  6. Calculate $P$P$($($X$X$\le$$56$56$\mid$$X\ge44$X44$)$) using geometric reasoning.

Question 3

The probability density function of a random variable $X$X is drawn below. Its non-zero values lie in the region $0\le x\le k$0xk.

Loading Graph...

  1. Calculate the value of $k$k.

  2. What equation defines the probability distribution function of $X$X in the domain $0\le x\le k$0xk?

  3. Calculate $P($P($X<0.8k$X<0.8k$)$), correct to two decimal places.

  4. Calculate $P($P($X\ge0.2k$X0.2k$)$), correct to two decimal places.

  5. Calculate $P($P($X<7$X<7$\mid$$X>2$X>2$)$).

Question 4

The mass of a $6$6 week old puppy, in grams, is modeled by a continuous random variable $X$X which has probability density function $p\left(x\right)$p(x) defined by $p\left(x\right)=k\sin\left(\frac{\pi}{180}\left(x-450\right)\right)$p(x)=ksin(π180(x450)) for $450\le x\le630$450x630, and $p\left(x\right)=0$p(x)=0 otherwise.

  1. Use calculus techniques to determine the value of $k$k. Leave your answer in exact form.

  2. Calculate the probability that a randomly chosen puppy weighs less than $490$490 grams. Leave your answer in simplified exact form (your answer may involve a trigonometric function).

  3. Calculate the probability that a randomly chosen $6$6 week puppy weighs more than $495$495 grams, if we are told that it weighs less than $585$585 grams. Leave your answer as an exact value.

Outcomes

MA12-8

solves problems using appropriate statistical processes

What is Mathspace

About Mathspace