Gaussian Processes

Anonymous contributor's avatar
Anonymous contributor
Published Dec 5, 2024
Contribute to Docs

Gaussian Processes are a supervised learning framework that predicts outcomes as distributions, assuming any set of input points follows a joint Gaussian distribution.

They are beneficial for modeling complex relationships and estimating the confidence of predictions.

Gaussian Processes are implemented in Scikit-learn via GaussianProcessRegressor and GaussianProcessClassifier.

Syntax

Scikit-learn provides the general sklearn.gaussian_process class for implementing all the essential Gaussian Processes.

Under this class, you can use the GaussianProcessRegressor class for regression tasks, and the GaussianProcessClassifier class for classification tasks.

Here is the basic syntax for using them:

GaussianProcessRegressor

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF

# Define the Gaussian Process Regressor
gp_regressor = GaussianProcessRegressor(kernel=RBF(), alpha=1e-10)
  • kernel: Defines the covariance function of the Gaussian Process. For example, RBF() represents a radial basis function kernel, which measures similarity between points.
  • alpha: Adds noise to the diagonal of the covariance matrix, useful for handling numerical stability during regression.

GaussianProcessClassifier

from sklearn.gaussian_process import GaussianProcessClassifier

# Define the Gaussian Process Classifier
gp_classifier = GaussianProcessClassifier(kernel=None, n_restarts_optimizer=10)
  • kernel: Defines the covariance function of the Gaussian Process. If set to None, the classifier uses the default radial basis function (RBF) kernel.
  • n_restarts_optimizer: Specifies the number of restarts for the optimizer when finding the best hyperparameters. Increasing this value may improve performance at the cost of computation time.

Example

This example demonstrates using GaussianProcessRegressor to model a sine function:

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
# Generate sample data
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = np.sin(X).ravel() + 0.1 * np.random.normal(size=100) # Add noise
# Define the kernel
kernel = RBF(length_scale=1.0)
# Train the Gaussian Process Regressor
gp = GaussianProcessRegressor(kernel=kernel, alpha=0.1)
gp.fit(X, y)
# Make predictions
X_test = np.linspace(0, 10, 100).reshape(-1, 1)
y_pred, sigma = gp.predict(X_test, return_std=True)
print("First 5 Predictions:", y_pred[:5])
print("First 5 Uncertainties (std_dev):", sigma[:5])

This example results in the following output:

First 5 Predictions: [ 0.053 0.087 0.119 0.148 0.175]
First 5 Uncertainties (std_dev): [0.1 0.1 0.1 0.1 0.1]

Codebyte Example

Try this example to experiment with GaussianProcessRegressor for modeling data:

Code
Output
Loading...

All contributors

Contribute to Docs

Learn Python:Sklearn on Codecademy