So far, we’ve defined the term sampling distribution and shown how we can simulate an approximated sampling distribution for a few different statistics (mean, maximum, variance, etc.). The Central Limit Theorem (CLT) allows us to specifically describe the sampling distribution of the mean.
The CLT states that the sampling distribution of the mean is normally distributed as long as the population is not too skewed or the sample size is large enough. Using a sample size of n > 30 is usually a good rule of thumb, regardless of what the distribution of the population is like. If the distribution of the population is normal, the sample size can be smaller than that.
Let’s take another look at the salmon weight to see how the CLT applies here. The first plot below shows the population distribution. The salmon weight is skewed right, meaning the tail of the distribution is longer on the right than on the left.
Next, we’ve simulated a sampling distribution of the mean (using a sample size of 100) and super-imposed a normal distribution on top of it. Note how the estimated sampling distribution follows the normal curve almost perfectly.
Note that the CLT only applies to the sampling distribution of the mean and not other statistics like maximum, minimum, and variance!
Instructions
In order to see the Central Limit Theorem in action, let’s look at another population of fish that is not normally distributed.
We have loaded this data on the weight of cod fish into the workspace.
Uncomment the three lines underneath ## Checkpoint 1
to see the plot of the distribution of cod fish. Note the distribution.
Now that we have seen the skewed population distribution, let’s simulate a sampling distribution of the mean. According to the CLT, we will see a normal distribution once the sampling size is large enough. To start, we have set the sample size to 6.
Uncomment the five lines at the very bottom, run the code once, and take a look at the sampling distribution.
Remember to scroll down to see the second plot.
Now change the sample size to 50 and run the code. Does the estimated sampling distribution look more normal now?