Linear Models

MamtaWardhani's avatar
Published Nov 30, 2024
Contribute to Docs

Linear models are foundational algorithms in machine learning for both regression and classification tasks. These models assume that the target variable can be conveyed as a linear combination of input features, making them simple yet effective for many datasets. In regression, the output is continuous, while in classification, a decision boundary is determined based on a linear relationship.

Scikit-learn offers a variety of linear models, each suited to different scenarios:

  • Linear Regression: Ideal for predicting continuous outputs by minimizing the sum of squared errors.
  • Ridge and Lasso Regression: Regularized linear models that address overfitting; Ridge uses L2 regularization, while Lasso employs L1 regularization for feature selection.
  • Elastic Net: Combines L1 and L2 regularization for a balance between Ridge and Lasso.
  • Logistic Regression: A classification algorithm using a logistic (sigmoid) function to handle binary or multiclass problems.
  • Bayesian Ridge Regression: Incorporates probabilistic modeling for regression, providing uncertainty estimates.
  • SGD Regressor and Classifier: Use stochastic gradient descent for scalable learning on large datasets.
  • Perceptron: A basic linear classifier based on a single-layer perceptron algorithm.
  • Passive-Aggressive Regressor and Classifier: Efficient for online learning and datasets arriving incrementally.

These algorithms make linear models highly versatile and widely applicable across domains, from predicting house prices to determining customer churn.

Syntax

Linear models in scikit-learn are implemented through various algorithms, each suited for specific use cases. Below is the syntax for one such algorithm, Linear Regression, a regression model that minimizes the sum of squared errors:

from sklearn.linear_model import LinearRegression

# Initializing the Linear Regression model
model = LinearRegression()

# Fitting the Linear Regression model on training data
model.fit(X_train, y_train)

# Predicting on new data
y_pred = model.predict(X_test)

Example

Here’s a practical example using Linear Regression, one of the core algorithms in scikit-learn’s linear models:

from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
# Generating a synthetic regression dataset
X, y = make_regression(n_samples=50, n_features=2, noise=10, random_state=44)
# Splitting data into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=44)
# Initializing and training the Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Predicting and displaying results
y_pred = model.predict(X_test)
print("Predicted values:", y_pred)

The code above produces the following possible output:

Predicted values: [ -24.64620375 -137.3587294 -108.54877597 10.60212286 -88.32083414
-73.45693671 -152.01390923 36.74143344 -2.06156108 149.66250584
-102.16294566 82.65076848 45.94240352 -181.44107066 -10.34601911]

The output of the code will be the predicted values for the test dataset generated by the Linear Regression model. Since the data is synthetic, the exact values depend on the randomly generated dataset.

In this example, Linear Regression is used to model a simple synthetic dataset. The same principles apply to other algorithms within the linear models family, such as Ridge, Lasso, and Logistic Regression, depending on the task.

All contributors

Contribute to Docs

Learn Python:Sklearn on Codecademy