Kernel Ridge Regression
Kernel ridge regression is a regression model that combines ridge regression with the kernel trick.
Ridge Regression
Ridge regression is a linear regression model with a least square loss function and L2 regularization. The loss function combines the least square loss with the L2 regularization term and allows the linear regression model to find the best-fit line on the training data. It is shown in the example below:
The L2 regularization adds a squared magnitude of the coefficient to penalize the loss function, distributing the impact of correlated features more evenly among the coefficients and preventing any one feature from dominating the model’s predictions. This prevents the model from overfitting the data and stabilizes the solution.
Applying the Kernel Trick
However, not all data can be separated linearly. In the real world, data can be randomly distributed, making it difficult to classify the data with linear regression as shown in the example below:
The kernel trick allows us to form a more complex model in the original feature space without incurring huge computing costs to transform the data into a higher dimensional space. It is more efficient and less expensive to use the kernel function, which is the dot product of x-transpose and y in 2-dimensional space, instead of doing more complicated computations in 4-dimensional space. The more dimensions you are working with, the more costly the computations. Because we are mapping data to a higher dimension, there is a greater chance that we may overfit the model, and thus, ridge regression balances out this critical issue by incorporating L2 regularization.
Syntax
# Initialize the classifier
model = KernelRidge()
# Train the model with training data
model.fit(x_train, y_train)
# Use the model to predict the outcomes for testing data
predictions = model.predict(x_test)
KernelRidge
has the following parameters:
alpha
(float or n-sized array of floats, default=1.0): Regularization strength to improve the conditioning of the problem and reduce variance of estimates.interaction_only
(str, default=’linear’): kernel mapping specified inpairwise.PAIRWISE_KERNEL_FUNCTIONS
. Other options include radial basis function, laplacian, polynomial, exponential chi2 and sigmoid kernels.gamma
(float, default=None): Gamma parameter for the kernel.order
(float, default=3): Degree of the polynomial kernel; ignored by other kernels.coef0
(float, default=1): Zero coefficient for polynomial and sigmoid kernels; ignored by other kernels.
Example
The example below shows kernel ridge regression analysis by generating datasets, fitting training data in KernelRidge
model, and making predictions on test data:
import numpy as npfrom sklearn.kernel_ridge import KernelRidgefrom sklearn.datasets import make_regressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_squared_error# Load datasetX, y = make_regression(n_samples=100, n_features=1, noise=0.1)# Preprocess dataX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Initialize Kernel Ridge instancekrr = KernelRidge(kernel='rbf', alpha=1.0, gamma=0.1)# Model the training datakrr.fit(X_train, y_train)# Predict the testing datay_pred = krr.predict(X_test)# Evaluate the modelmse = mean_squared_error(y_test, y_pred)print(f'Mean Squared Error: {mse}')
Here is the output for the above example:
Mean Squared Error: 9.33325172522716
Codebyte Example
All contributors
- Anonymous contributor
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.