topic badge

8.02 Cumulative distribution functions

Lesson

In the previous lesson, we calculated the probability of a continuous distribution function by integrating the probability density function. The Cumulative Distribution Function (CDF) provides a general formula for finding the probabilities of continuous distribution functions. The CDF is essentially the primitive function of the probability density function.

The CDF gives us the probability of a random variable being less than or equal to a given cutoff.

We can use the CDF to find probabilities, measures of location and quantiles.

Cumulative distribution function (CDF)

For a continuous random variable $X$X that has values in a closed interval $\left[a,b\right]$[a,b] then the cumulative distribution function (CDF) is

$F\left(x\right)=P\left(a\le X\le x\right)$F(x)=P(aXx) for all $x$x in the domain $\left[a,b\right]$[a,b]

$F\left(x\right)=\int_a^x\ f(t)\ dt$F(x)=xa f(t) dt where $f\left(t\right)$f(t) is the probability density function defined in the domain $\left[a,b\right]$[a,b]

An identity that may prove to be useful here is:

In particular, we can make good use of this when $f\left(t\right)$f(t) is a piecewise function and $x=c$x=c is the boundary value at which $f\left(t\right)$f(t) changes from one sub-function to another.

Worked examples

Example 1

A probability density function is defined piecewise by:

$f\left(x\right)$f(x)$=$= $k\left(5+x\right)$k(5+x), $-3\le x\le0$3x0
$k\left(5-x\right)$k(5x), $0\le x\le3$0x3
$0$0, elsewhere

 

(a) Find the value of the constant $k$k, and hence, write the equation of $f\left(x\right)$f(x).

Think: The integral of $f\left(x\right)$f(x) over the domain $\left[-3,3\right]$[3,3] must be $1$1 because it is a probability density function. We can integrate the piecewise function by integrating the separate pieces over their respective domains. Then solve for $k$k by equating the integral to $1$1.

Do:

For the integral to be $1$1, the value of $k$k must be $\frac{1}{21}$121.

The function is therefore:

$f\left(x\right)$f(x)$=$= $\frac{1}{21}\left(5+x\right)$121(5+x), $-3\le x\le0$3x0
$\frac{1}{21}\left(5-x\right)$121(5x), $0\le x\le3$0x3
$0$0, elsewhere

 

(b) Find the cumulative distribution function, $F\left(x\right)$F(x), for the probability density function given.

Think: Just as the probability density function is split into two, to find the cumulative distribution function we will find the function over each interval and then combine.

Do:

For $-3\le x\le0$3x0:

For $0\le x\le3$0x3, $F\left(x\right)$F(x) gives the area under the curve up to $x$x, so for a point $0\le x\le3$0x3, we will require the area up to $x=0$x=0 plus the area up to the point under the second curve. Since $F\left(0\right)=\frac{1}{2}$F(0)=12 (from above), we have:

Hence, the cumulative distribution function is:

$F\left(x\right)$F(x)$=$= $\frac{1}{21}\left(5x+\frac{x^2}{2}+\frac{21}{2}\right)$121(5x+x22+212), $-3\le x\le0$3x0
$\frac{1}{2}+\frac{1}{21}\left(5x-\frac{x^2}{2}\right)$12+121(5xx22), $0\le x\le3$0x3
$0$0, elsewhere

 

Example 2

A continuous probability function is given by $f\left(x\right)=\frac{4x^3}{255}$f(x)=4x3255 defined in the domain $\left[1,4\right]$[1,4] where $f\left(x\right)=0$f(x)=0 for all other $x$x.

(a) Find the cumulative distribution function.

Think: The CDF is found by integrating $f\left(x\right)$f(x).

Do:

 

(b) Use the CDF to find $P\left(X\le3\right)$P(X3)

Think: $P\left(X\le3\right)$P(X3) is the area under the function to the left of $x=3$x=3

Do: Using $F\left(x\right)=\frac{x^4-1}{155}$F(x)=x41155, we substitute $x=3$x=3:

$P\left(X\le3\right)$P(X3) $=$= $F(3)$F(3)
  $=$= $\frac{3^4-1}{255}$341255
  $=$= $\frac{81-1}{255}$811255
  $=$= $\frac{80}{255}$80255
  $=$= $\frac{16}{51}$1651

Therefore, $P\left(X\le3\right)=\frac{16}{51}$P(X3)=1651

(c) Use the CDF to find $P\left(1.5\le X\le3.1\right)$P(1.5X3.1).

Think: The area under the curve that we are interested in is found by calculating the integral between $x=1.5$x=1.5 and $x=3.1$x=3.1 or simply finding $F\left(3.1\right)-F\left(1.5\right)$F(3.1)F(1.5) using the CDF.

Do:

$P\left(1.5\le X\le3.1\right)$P(1.5X3.1) $=$= $F\left(3.1\right)-F\left(1.5\right)$F(3.1)F(1.5)
  $=$= $\frac{3.1^4-1}{255}-\frac{1.5^4-1}{255}$3.1412551.541255
  $\approx$ $0.3582-0.0159$0.35820.0159
  $=$= $0.342$0.342 (to three decimal places)

Finding the mode using the CDF

The mode is the data value with the highest frequency. For a continuous distribution, we look for the value of $x$x that gives the maximum point of a probability density function. Depending on the function, we may need to use calculus to help us find where the maximum value occurs.

Worked example

Example 3

A continuous probability distribution $f\left(x\right)=\frac{3x\left(6-x\right)}{100}$f(x)=3x(6x)100 is defined in the domain $\left[1,6\right]$[1,6], find the mode of the distribution.

Think: The mode is the value of $x$x which give the maximum point of the probability function. We can use calculus to find the first derivative and solve $f'\left(x\right)=0$f(x)=0 to find the stationary point and check this is within the given domain. Then we can use the second derivative to test that it is a maximum. Looking at the function we can see that it is a concave down parabola as $a<0$a<0 therefore we can expect there to be a maximum point.

Do:

Differentiating:

 

$f\left(x\right)$f(x) $=$= $\frac{3x\left(6-x\right)}{100}$3x(6x)100
  $=$= $\frac{1}{100}\left(18x-3x^2\right)$1100(18x3x2)
$\therefore f'\left(x\right)$f(x) $=$= $\frac{1}{100}\left(18-6x\right)$1100(186x)

 

Solving $f'\left(x\right)=0$f(x)=0 for the stationary point:

$\frac{1}{100}\left(18-6x\right)$1100(186x) $=$= $0$0

Multiply both sides by $100$100

$18-6x$186x $=$= $0$0

 

$6x$6x $=$= $18$18

 

$x$x $=$= $3$3

 

 

Therefore, there is a stationary point when $x=3$x=3, this is within the domain of the probability function. So if we confirm this stationary point is a maximum, we have found our mode.

Differentiating again to find the second derivative:

$f''\left(x\right)=-\frac{6}{100}$f(x)=6100

At the point $x=3$x=3:

$f''\left(3\right)=-\frac{6}{100}<0$f(3)=6100<0

Therefore, since the graph is concave down, the maximum value does indeed occur at $x=3$x=3 and this is the mode of the distribution.

Practice questions

Question 1

For a random variable, consider the following probability density function.

$f\left(x\right)$f(x) $=$= $\frac{5x^4}{7775}$5x47775 for $1\le x\le6$1x6
$0$0 otherwise
  1. State the cumulative distribution function $F\left(x\right)$F(x) over $1\le x\le6$1x6 where $F\left(x\right)=0$F(x)=0 for $x<1$x<1 and $F\left(x\right)=1$F(x)=1 for $x>6$x>6.

    Use $C$C as the constant of integration.

  2. Find $P\left(X\le2\right)$P(X2).

  3. Find $P\left(X<5\right)$P(X<5).

  4. Find $P\left(2\le X\le4\right)$P(2X4).

Question 2

Find the mode of the following probability density functions.

  1. $f\left(x\right)$f(x) $=$= $\frac{3\left(9+8x-x^2\right)}{434}$3(9+8xx2)434 for $\left[0,7\right]$[0,7]
    $0$0 otherwise
  2. $f\left(x\right)$f(x) $=$= $\frac{4e^{4x}}{e^8\left(e^{16}-1\right)}$4e4xe8(e161) for $2\le x\le6$2x6
    $0$0 otherwise

Finding quantiles using the CDF

We know that the CDF gives us the probability of a range of values. We also know the area under the probability density function is $1$1. Knowing this we can find various quantiles of the distribution by solving $F\left(x\right)$F(x) for a specific area.

Median

Because the area under a probability density function is $1$1, then it follows that the area either side of the median value of a continuous probability distribution must be $0.5$0.5.

Using the CDF the median is the value of $x$x where $F\left(x\right)=\int_a^x\ f\left(t\right)\ dt=0.5$F(x)=xa f(t) dt=0.5 where $f\left(x\right)$f(x) is the probability density function defined in the domain $\left[a,b\right]$[a,b].

Worked example

Example 4

Find the median of the continuous probability distribution defined as $f\left(x\right)=\frac{1}{24}\left(x+3\right)$f(x)=124(x+3) in the domain $\left[1,5\right]$[1,5].

Think: We want to find $x$x such that $\int_1^x\ f\left(x\right)\ dx=0.5$x1 f(x) dx=0.5. We can do this be finding the cumulative distribution function $F\left(x\right)$F(x) and then solving for $F\left(x\right)=0.5$F(x)=0.5.

Do: Integrating $f\left(x\right)=\frac{1}{24}(x+3)$f(x)=124(x+3):

Solving for $F\left(x\right)=\frac{1}{2}$F(x)=12:

$\frac{1}{24}(\frac{x^2}{2}+3x-\frac{7}{2})$124(x22+3x72) $=$= $\frac{1}{2}$12

First multiply both sides by $24$24

$\frac{x^2}{2}+3x-\frac{7}{2}$x22+3x72 $=$= $12$12

Next take $12$12 from both sides

$\frac{x^2}{2}+3x-\frac{31}{2}$x22+3x312 $=$= $0$0

Now multiply both sides by $2$2 to simplify

$x^2+6x-31$x2+6x31 $=$= $0$0

Finally, use technology or quadratic formula to solve

$\therefore x$x $=$= $-3\pm2\sqrt{10}$3±210

 

 

Since $1\le x\le5$1x5, $x=-3+2\sqrt{10}\approx3.32$x=3+2103.32.

Hence, the median is approximately $3.32$3.32.

Quartiles

Quartiles are the upper limit of particular proportions of a data set. Specifically, $Q_1$Q1 represents the first $25%$25% of the data set and $Q_3$Q3 represents the first $75%$75% of the data set. So when we want to find the lower quartile, $Q_1$Q1, for example, we solve $F\left(x\right)=0.25$F(x)=0.25.

Deciles and percentiles

Deciles divide a data set into ten parts and percentiles divide a data set into one hundred parts. Therefore to find, for example, the 6th decile we solve $F\left(x\right)=0.6$F(x)=0.6. And similarly if we are to find the 78th percentile we solve $F\left(x\right)=0.78.$F(x)=0.78.

In summary
  • The Cumulative Distribution Function (CDF), $F\left(x\right)$F(x), is the general formula to find the probabilities of continuous probability distributions.

$F\left(x\right)=\int_a^x\ f\left(t\right)\ dt$F(x)=xa f(t) dt where $f\left(x\right)$f(x) is the probability density function defined in the domain $\left[a,b\right]$[a,b]

  • The mode is the maximum point of a continuous probability distribution that can be found by calculus or by knowing the highest point on the graph of the probability density function
  • We can use the CDF to find probabilities of intervals and also various quantiles by using the fact that the area under the CDF is $1$1 and solving for the particular quantile. For example, to find the median we solve $F\left(x\right)=0.5$F(x)=0.5, to find the $7$7th decile we solve $F\left(x\right)=0.7$F(x)=0.7, and so on.

Practice question

Question 3

For the following probability density function, find:

$f\left(x\right)$f(x) $=$= $\frac{x^2}{168}$x2168 for $2\le x\le8$2x8
$0$0 elsewhere
  1. the median, $m$m.

    Round your answer to two decimal places.

  2. the $3$3rd quartile, $q$q.

    Round your answer to two decimal places.

  3. the $67$67th percentile, $p$p.

    Round your answer to two decimal places.

  4. the $8$8th decile, $r$r.

    Round your answer to two decimal places.

Outcomes

MA12-8

solves problems using appropriate statistical processes

What is Mathspace

About Mathspace