So far, we’ve learned that the equation for a logistic regression model looks like this:
Note that we’ve replaced y with the letter p because we are going to interpret it as a probability (eg., the probability of a student passing the exam). The whole left-hand side of this equation is called log-odds because it is the natural logarithm (ln) of odds (p/(1-p)). The right-hand side of this equation looks exactly like regular linear regression!
In order to understand how this link function works, let’s dig into the interpretation of log-odds a little more. The odds of an event occurring is:
For example, suppose that the probability a student passes an exam is 0.7. That means the probability of failing is 1 - 0.7 = 0.3. Thus, the odds of passing are:
This means that students are 2.33 times more likely to pass than to fail.
Odds can only be a positive number. When we take the natural log of odds (the log odds), we transform the odds from a positive value to a number between negative and positive infinity — which is exactly what we need! The logit function (log odds) transforms a probability (which is a number between 0 and 1) into a continuous value that can be positive or negative.
Instructions
Suppose that there is a 40% probability of rain today (p = 0.4). Calculate the odds of rain and save it as odds_of_rain
. Note that the odds are less than 1 because the probability of rain is less than 0.5.
Feel free to print odds_of_rain
to see the results.
Use the odds that you calculated above to calculate the log odds of rain and save it as log_odds_of_rain
. You can calculate the natural log of a value using the numpy.log()
function. Note that the log odds are negative because the probability of rain was less than 0.5.
Feel free to print log_odds_of_rain
to see the results.
Suppose that there is a 90% probability that my train to work arrives on-time. Calculate the odds of my train being on-time and save it as odds_on_time
. Note that the odds are greater than 1 because the probability is greater than 0.5.
Feel free to print odds_on_time
to see the results.
Use the odds that you calculated above to calculate the log odds of an on-time train and save it as log_odds_on_time
. Note that the log odds are positive because the probability of an on-time train was greater than 0.5.
Feel free to print log_odds_on_time
to see the results.