Learn

As we saw in the last example, each time we sample from a population, we will get a slightly different sample mean. In order to understand how much variation we can expect in those sample means, we can do the following:

  • Take a bunch of random samples of fish, each of the same size (50 fish in this example)
  • Calculate the sample mean for each one
  • Plot a histogram of all the sample means

This process gives us an estimate of the sampling distribution of the mean for a sample size of 50 fish.

The code to accomplish this is shown below:

salmon_population = population['Salmon_Weight'] sample_size = 50 sample_means = [] # loop 500 times to get 500 random sample means for i in range(500): # take a sample from the data: samp = np.random.choice(salmon_population, sample_size, replace = False) # calculate the mean of this sample: this_sample_mean = np.mean(samp) # append this sample mean to a list of sample means sample_means.append(this_sample_mean) # plot all the sample means to show the sampling distribution sns.histplot(sample_means, stat='density') plt.title("Sampling Distribution of the Mean") plt.show()

The distribution of the sample_means looks like this:

This is a sampling distribution with a sample of 500. The distribution is centered around x=60 and looks fairly symmetrical.

Note that we can look at a sampling distribution for any statistic. For example, we could estimate the sampling distribution of the maximum by calculating the maximum of each sample, rather than the mean (as shown above).

Instructions

1.

Let’s estimate the sampling distribution of the mean using a population of cod fish. As we did with salmon fish, we will pretend we are all-knowing and have captured weight data on every cod fish in the ocean. In the workspace, we’ve loaded in the cod weight data.

We’ve set the sample size equal to 50 and created a for loop to take 500 random samples.

  • Inside the for loop, use the function np.mean() to calculate the mean of each sample. Save this to a variable called this_sample_mean.
  • Then, still inside the for loop, append this_sample_mean to the list sample_means and run the simulation.
2.

Awesome, you’ve now estimated the sampling distribution of the mean for a sample size of 50! Inspect the histogram. What do you notice?

Click Run to move onto the next exercise.

Take this course for free

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.
Already have an account?