In statistics, we often want to learn about a large population. Since collecting data for an entire population is often impossible, researchers may use a smaller sample of data to try to answer their questions.
To do this, a researcher might calculate a statistic such as mean or median for a sample of data. Then they can use that statistic as an estimate for the population value they really care about.
For example, suppose that a researcher wants to know the average weight of all Atlantic Salmon fish. It would be impossible to catch every single fish. Instead, the researchers might collect a sample of 50 fish off the coast of Nova Scotia and determine that the average weight of those fish is x. If the same researchers collected 50 new fish and took the new average weight, that average would likely be slightly different than the first sample average.
Over the course of this lesson, we will go over how we can extrapolate from sample data in order to describe our uncertainty about the statistics of the full population.
The applet to the right shows a population distribution and a sample distribution that was randomly drawn from that population. The Generate button will randomly select a new sample of a particular size (indicated by the “sample size” field).
Set the sample size to 6 and click Generate. Compare the sample mean to the population mean.
- If a researcher used this sample to estimate the population mean, how far off would they be? Click Generate a few more times, paying attention to the sample mean.
- The sample size of 6 is incredibly small, so the sample mean varies a lot depending on which salmon we randomly choose.
- Because of this, each individual sample may not accurately describe the full population of salmon. Since the sample size is so small, extreme values will have a greater impact on our estimate of the population mean.
Next, increase the sample size to 100 and click Generate a few times, paying attention to the sample mean.
- With a larger sample size, do the sample means now seem closer to the population mean?
- Generally with larger sample sizes, the sample mean is closer to the population mean. Extreme values will now have a smaller impact on our estimate of the population mean.