According to the Central Limit Theorem, the sampling distribution of the mean:
In the plots provided, the left plot shows the population distribution of salmon weights, and the right plot shows the sampling distribution of the mean salmon weights.
When you increase the sample size, the standard error of the mean decreases. This can be seen from the formula:
As sample size increases, the denominator increases while the numerator remains constant.
A biased estimator is a statistic such that the mean of that statistic’s sampling distribution is not equal to the value of that statistic for the population.
Minimum is an example of a biased estimator because any particular sample minimum is likely to be larger than the population minimum. Variance is another example of a biased estimator, and this is shown in the provided plot.
If we want to know the probability that a sample from a population will have a mean in some specific range, we can:
The code block given shows how to do this using Python.
# calculate standard error using population standard deviation and sample sizestandard_error = std_dev / (samp_size**.5)# use the cdf scipy method to calculate the probability of observing some value x or lowerstats.norm.cdf(x,mean,standard_error)
The CLT holds true if:
Since we often don’t know the distribution of the population, it is safer to always make sure to have a sufficiently large sample size.
The standard deviation of a sampling distribution is also known as the standard error of the estimate of a mean. The standard error for a sample mean can be calculated with the following formula: