Suppose you want to know if students who study history are more interested in volleyball than students who study chemistry. Before doing anything else to answer your original question, you come up with a null hypothesis: "History and chemistry students are interested in volleyball at the same rates."

To test this hypothesis, you need to design an experiment and collect data. You invite 100 history majors and 100 chemistry majors from your university to join an extracurricular volleyball team. After one week, 34 history majors sign up (34%), and 39 chemistry majors sign up (39%). More chemistry majors than history majors signed up, but is this a “real”, or significant difference? Can you conclude that students who study chemistry are more interested in volleyball than students who study history?

In your experiment, the 100 history and 100 chemistry majors at your university are samples of their respective populations (all history and chemistry majors). The sample means are the percentages of history majors (34%) and chemistry majors (39%) that signed up for the team, and the difference in sample means is 39% - 34% = 5%. The population means are the percentage of history and chemistry majors worldwide that would sign up for an extracurricular volleyball team if given the chance.

You want to know if the difference you observed in these sample means (5%) reflects a difference in the population means, or if the difference was caused by sampling error, and the samples of students you chose do not represent the greater populations of history and chemistry students.

Restating the null hypothesis in terms of the population means yields the following:

"The percentage of all history majors who would sign up for volleyball is the same as the percentage of all chemistry majors who would sign up for volleyball, and the observed difference in sample means is due to sampling error."

This is the same as saying, “If you gave the same volleyball invitation to every history and chemistry major in the world, they would sign up at the same rate, and the sample of 200 students you selected are not representative of their populations.”



Your friend is a dog walker that specializes in working with Golden Retrievers and Goldendoodles. They are interested in knowing if there is a signficant difference in the lengths of the two breeds. After a few weeks of data collection, they give you a spreadsheet of 10 Golden Retrievers’ lengths and 10 Goldendoodles’ lengths.

The lengths of the dogs are given in retriever_lengths and doodle_lengths. Calculate the mean of each breed and save the results to mean_retriever_l and mean_doodle_l. View mean_retriever_l and mean_doodle_l.


Calculate the difference between mean_retriever_l and mean_doodle_l and save the result to mean_difference. View mean_difference.


You want to run a hypothesis test to see if there is a significant difference in the lengths of Golden Retrievers and Goldendoodles. Which of the two statements could be a formulation of the null hypothesis?

Update the value of null_hypo with "st_1" or "st_2" depending on your answer.

Take this course for free

Mini Info Outline Icon
By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.

Or sign up using:

Already have an account?