Another field that informs the field of data science is probability.
Probability is the mathematical study of what could potentially happen. Fun fact: the study of probability came about as a method to decode secret messages.
In data science, probability calculations are used to build models. Models are able to help us understand data that has yet to exist - either data we hadn’t previously collected or data that has yet to be created. Data scientists create models to help calculate the probability of a certain action and then they use that probability to make informed decisions.
For instance, the social networking company Facebook collects data on the likes and dislikes of its users. Data scientists use that data to create models to calculate the probability of a user liking a certain advertisement. So, if you like several facebook posts about football, then a model may calculate a high probability of you positively responding to an advertisement selling football tickets at a stadium near you.
Calculating the probability of an event is sometimes dependent on other factors. For instance, in the birthday problem. “What is the probability that two people in a room have the same birthday?” the probability is dependent on the number of people in the room.
Other times, the probability of something is constant. For instance, the probability of flipping a coin and it landing heads will always be 50%.
In data science, probability is often used to simulate scenarios.
The code on the right simulates the birthday problem. Right now the code simulates a room with only 2 people that get random birthdays, and the probability that those 2 people have the same birthday is really low.
Change the number
2 to a higher number of your choosing where it says
#Change This Number and run the code.
Is there a match in the simulation? What’s the probability that there would be a match?
Keep changing the number to test out different simulations.