As we saw in the last example, each time we sample from a population, we will get a slightly different sample mean. In order to understand how much variation we can expect in those sample means, we can do the following:
- Take a bunch of random samples of fish, each of the same size (50 fish in this example)
- Calculate the sample mean for each one
- Plot a histogram of all the sample means
This process gives us an estimate of the sampling distribution of the mean for a sample size of 50 fish.
The code to accomplish this is shown below:
salmon_population = population['Salmon_Weight'] sample_size = 50 sample_means =  # loop 500 times to get 500 random sample means for i in range(500): # take a sample from the data: samp = np.random.choice(salmon_population, sample_size, replace = False) # calculate the mean of this sample: this_sample_mean = np.mean(samp) # append this sample mean to a list of sample means sample_means.append(this_sample_mean) # plot all the sample means to show the sampling distribution sns.histplot(sample_means, stat='density') plt.title("Sampling Distribution of the Mean") plt.show()
The distribution of the
sample_means looks like this:
Note that we can look at a sampling distribution for any statistic. For example, we could estimate the sampling distribution of the maximum by calculating the maximum of each sample, rather than the mean (as shown above).
Let’s estimate the sampling distribution of the mean using a population of cod fish. As we did with salmon fish, we will pretend we are all-knowing and have captured weight data on every cod fish in the ocean. In the workspace, we’ve loaded in the cod weight data.
We’ve set the sample size equal to 50 and created a for loop to take 500 random samples.
- Inside the for loop, use the function
np.mean()to calculate the mean of each sample. Save this to a variable called
- Then, still inside the for loop, append
this_sample_meanto the list
sample_meansand run the simulation.
Awesome, you’ve now estimated the sampling distribution of the mean for a sample size of 50! Inspect the histogram. What do you notice?
Click Run to move onto the next exercise.