In the previous exercise, the sample means you calculated closely approximated the population mean. This won’t always be the case!
Consider a tailor of school uniforms at a school for students aged
13. The tailor needs to know the average height of all the students in order to know which sizes to make the uniforms.
The tailor measures the heights of a random sample of
20 students out of the
300 in the school. The average height of the sample is
57.5 inches. Using this sample mean, the tailor makes uniforms that fit students of this height, some smaller, and some larger.
After delivering the uniforms, the tailor starts to receive some feedback — many of the uniforms are too small! They go back to take measurements on the rest of the students, collecting the following data:
- 11 year olds average height:
- 12 year olds average height:
- 13 year olds average height:
- All students average height (population mean):
The original sample mean was off from the population mean by
2 inches! How did this happen?
The random sample of
20 students was skewed to one direction of the total population. More
11 year olds were chosen in the sample than is representative of the whole school, bringing down the average height of the sample. This is called a sampling error, and occurs when a sample is not representative of the population it comes from. How do you get an average sample height that looks more like the average population height, and reduce the chance of a sampling error?
20 students for the sample allowed for the chance that only younger, shorter students were included. This is a natural consequence of the fact that a sample has less data than the population to which it belongs. If the sample selection is poor, then you will have a sample mean seriously skewed from the population mean.
There is one surefire way to mitigate the risk of having a skewed sample mean — take a larger set of samples! The sample mean of a larger sample set will more closely approximate the population mean, and reduce the chance of a sampling error.
In the workspace, we have a population that is normally distributed. Generate samples of different sizes and see how the sample mean could differ from the population mean.
What happens to the difference between the sample mean and the population mean as you increase the sample size?