In Linear Regression we multiply the coefficients of our features by their respective feature values and add the intercept, resulting in our prediction, which can range from -∞ to +∞. In Logistic Regression, we make the same multiplication of feature coefficients and feature values and add the intercept, but instead of the prediction, we get what is called the log-odds.
The log-odds are another way of expressing the probability of a sample belonging to the positive class, or a student passing the exam. In probability, we calculate the odds of an event occurring as follows:
The odds tell us how many more times likely an event is to occur than not occur. If a student will pass the exam with probability
0.7, they will fail with probability
1 - 0.7 = 0.3. We can then calculate the odds of passing as:
The log-odds are then understood as the logarithm of the odds!
For our Logistic Regression model, however, we calculate the log-odds, represented by
z below, by summing the product of each feature value by its respective coefficient and adding the intercept. This allows us to map our feature values to a measure of how likely it is that a data sample belongs to the positive class.
b_0is the intercept
b_nare the coefficients of the features
This kind of multiplication and summing is known as a dot product.
We can perform a dot product using
np.dot() method! Given feature matrix
features, coefficient vector
coefficients, and an
intercept, we can calculate the log-odds in
numpy as follows:
log_odds = np.dot(features, coefficients) + intercept
np.dot() will take each row, or student, in
features and multiply each individual feature value by its respective coefficient in
coefficients, summing the result, as shown below.
We then add in the intercept to get the log-odds!
Let’s create a function
log_odds that takes
intercept as parameters. For now return
log_odds to return the dot product of
return statement of log-odds by adding the
intercept after the dot product.
log_odds function you created, let’s calculate the log-odds of passing for the Introductory Machine Learning students. Use
hours_studied as the features,
calculated_coefficients as the coefficients and
intercept as the intercept. Store the result in
calculated_log_odds, and print it out.