According to the Central Limit Theorem, the mean of the sampling distribution of the mean is equal to the population mean. This is the case for some, but not all, sampling distributions. Remember, you can have a sampling distribution for any sample statistic, including:
- max / min
Because the mean of the sampling distribution of the mean is equal to the mean of the population, we call it an unbiased estimator. A statistic is called an unbiased estimator of a population parameter if the mean of the sampling distribution of the statistic is equal to the value of the statistic for the population.
The maximum is one example of a biased estimator, meaning that the mean of the sampling distribution of the maximum is not centered at the population maximum.
In the workspace, you can see the sampling distribution of the maximum. The mean of the distribution is not equal to the maximum of the population, showing that it is a biased estimator.
Let’s look at another example. Edit the function
app_statistic() so that it returns the variance using the NumPy function
np.var(). (You can change the string as well to update the title of your plots.)
Based on the resulting mean of the sampling distribution, would you say that variance is a biased or unbiased estimator?
Change the statistic to mean using the
np.mean() NumPy function. Does what you see correspond with what we know about biased and unbiased estimators?
Feel free to try out other statistics in the