Similar to how discrete random variables relate to probability mass functions, continuous random variables relate to probability density functions. They define the probability distributions of continuous random variables and span across all possible values that the given random variable can take on.
When graphed, a probability density function is a curve across all possible values the random variable can take on, and the total area under this curve adds up to 1.
The following image shows a probability density function. The highlighted area represents the probability of observing a value within the highlighted range.
In a probability density function, we cannot calculate the probability at a single point. This is because the area of the curve underneath a single point is always zero. The gif below showcases this.
As we can see from the visual above, as the interval becomes smaller, the width of the area under the curve becomes smaller as well. When trying to evaluate the area under the curve at a specific point, the width of that area becomes 0, and therefore the probability equals 0.
We can calculate the area under the curve using the cumulative distribution function for the given probability distribution.
For example, heights fall under a type of probability distribution called a normal distribution. The parameters for the normal distribution are the mean and the standard deviation, and we use the form Normal(mean, standard deviation) as shorthand.
We know that women’s heights have a mean of 167.64 cm with a standard deviation of 8 cm, which makes them fall under the Normal(167.64, 8) distribution.
Let’s say we want to know the probability that a randomly chosen woman is less than 158 cm tall. We can use the cumulative distribution function to calculate the area under the probability density function curve from 0 to 158 to find that probability.
We can calculate the area of the blue region in Python using the norm.cdf()
method from the scipy.stats
library. This method takes on 3 values:
x
: the value of interestloc
: the mean of the probability distributionscale
: the standard deviation of the probability distribution
import scipy.stats as stats # stats.norm.cdf(x, loc, scale) print(stats.norm.cdf(158, 167.64, 8))
Output:
# 0.1141
Instructions
Following the same Normal(167.64, 8) distribution, assign the variable prob
the probability that a randomly chosen woman is less than 175 cm tall. You should use the stats.norm.cdf()
method.
Be sure to print prob
.