Linear Models
Linear models are foundational algorithms in machine learning for both regression and classification tasks. These models assume that the target variable can be conveyed as a linear combination of input features, making them simple yet effective for many datasets. In regression, the output is continuous, while in classification, a decision boundary is determined based on a linear relationship.
Scikit-learn offers a variety of linear models, each suited to different scenarios:
- Linear Regression: Ideal for predicting continuous outputs by minimizing the sum of squared errors.
- Ridge and Lasso Regression: Regularized linear models that address overfitting; Ridge uses L2 regularization, while Lasso employs L1 regularization for feature selection.
- Elastic Net: Combines L1 and L2 regularization for a balance between Ridge and Lasso.
- Logistic Regression: A classification algorithm using a logistic (sigmoid) function to handle binary or multiclass problems.
- Bayesian Ridge Regression: Incorporates probabilistic modeling for regression, providing uncertainty estimates.
- SGD Regressor and Classifier: Use stochastic gradient descent for scalable learning on large datasets.
- Perceptron: A basic linear classifier based on a single-layer perceptron algorithm.
- Passive-Aggressive Regressor and Classifier: Efficient for online learning and datasets arriving incrementally.
These algorithms make linear models highly versatile and widely applicable across domains, from predicting house prices to determining customer churn.
Syntax
Linear models in scikit-learn are implemented through various algorithms, each suited for specific use cases. Below is the syntax for one such algorithm, Linear Regression, a regression model that minimizes the sum of squared errors:
from sklearn.linear_model import LinearRegression
# Initializing the Linear Regression model
model = LinearRegression()
# Fitting the Linear Regression model on training data
model.fit(X_train, y_train)
# Predicting on new data
y_pred = model.predict(X_test)
Example
Here’s a practical example using Linear Regression, one of the core algorithms in scikit-learn’s linear models:
from sklearn.linear_model import LinearRegressionfrom sklearn.datasets import make_regressionfrom sklearn.model_selection import train_test_split# Generating a synthetic regression datasetX, y = make_regression(n_samples=50, n_features=2, noise=10, random_state=44)# Splitting data into training and testingX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=44)# Initializing and training the Linear Regression modelmodel = LinearRegression()model.fit(X_train, y_train)# Predicting and displaying resultsy_pred = model.predict(X_test)print("Predicted values:", y_pred)
The code above produces the following possible output:
Predicted values: [ -24.64620375 -137.3587294 -108.54877597 10.60212286 -88.32083414-73.45693671 -152.01390923 36.74143344 -2.06156108 149.66250584-102.16294566 82.65076848 45.94240352 -181.44107066 -10.34601911]
The output of the code will be the predicted values for the test dataset generated by the Linear Regression model. Since the data is synthetic, the exact values depend on the randomly generated dataset.
In this example, Linear Regression is used to model a simple synthetic dataset. The same principles apply to other algorithms within the linear models family, such as Ridge, Lasso, and Logistic Regression, depending on the task.
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Sklearn on Codecademy
- Career path
Computer Science
Looking for an introduction to the theory behind programming? Master Python while learning data structures, algorithms, and more!Includes 6 CoursesWith Professional CertificationBeginner Friendly75 hours - Course
Learn Python 3
Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.With CertificateBeginner Friendly23 hours