Linear Regression Analysis

Anonymous contributor's avatar
Anonymous contributor
Published Dec 5, 2024
Contribute to Docs

In sklearn, Linear Regression Analysis is a machine learning technique used to predict a dependent variable based on one or more independent variables, assuming a linear relationship.

In simple linear regression, we predict the dependent variable Y using a single independent variable X, fitting the data to a straight line, often called as the regression line. The equation for the line is:

$$ Y = \beta_0 + \beta_1 X + \epsilon $$

  • Y: Y is the dependent variable which is to be predicted.
  • beta_0: It represents the predicted value of Y when X has no effect.
  • beta_1: It is the coefficient that measures relationship between variable X and Y.
  • X: X is the independent variable that is used to predict Y.
  • epsilon: It is used to calculate the difference between the observed and predicted values of Y.

Syntax

Here’s the basic syntax for implementing linear regression analysis in Python:

from sklearn.linear_model import LinearRegression

# Create the model
model = LinearRegression()

# Fit the model
model.fit(X, y)

# Predict the dependent variable
predictions = model.predict(X)  # X is the input for which predictions are to be made

Example

In this example, the dependent variable yis directly proportional to the independent variable X, showing a basic linear regression:

# Import required libraries
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Sample data (Simple Linear Regression)
# X: Independent variable, y: Dependent variable
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) # Reshaping to 2D array for sklearn
y = np.array([1, 2, 3, 4, 5])
# Create the Linear Regression model
model = LinearRegression()
# Fit the model
model.fit(X, y)
# Make predictions
predictions = model.predict(X)
# Output the results
print(f"Intercept (β₀): {model.intercept_}")
print(f"Coefficient (β₁): {model.coef_[0]}")
print(f"Predictions: {predictions}")
# Plot the data and regression line
plt.scatter(X, y, color='red', label='Data Points')
plt.plot(X, predictions, color='green', label='Regression Line')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Simple Linear Regression')
plt.legend()
plt.show()

The code outputs the following result:

Intercept (β₀): 0.0
Coefficient (β₁): 1.0
Predictions: [1. 2. 3. 4. 5.]

![A scatter plot showing the data points (red dots) and the corresponding regression line (green) representing the simple linear regression model.] (https://raw.githubusercontent.com/Codecademy/docs/main/media/linear-regressin-analysis.png)

Codebyte Example

Code
Output
Loading...

All contributors

Contribute to Docs

Learn Python:Sklearn on Codecademy