Codecademy Logo

Introduction to SHAP

Global SHAP Values

SHAP values can offer global explanations by identifying overall trends in feature importance across a dataset. This involves aggregating SHAP values, like computing the mean absolute SHAP values for each feature, to provide insight into which features are most influential.

```python
import shap
# Compute SHAP values
explainer = shap.Explainer(model, X_train)
shap_values = explainer(X_test)
# Calculate mean absolute SHAP values for global explanations
global_shap = np.abs(shap_values.values).mean(axis=0)
```

Python SHAP Values

SHAP values offer insights into how features affect model predictions. They allocate impact scores to each feature per prediction, thus producing local explanations of the model’s output. Using SHAP in Python, these values help in understanding the significance of variables for individual outcomes.

import shap
# Create a SHAP explainer
explainer = shap.Explainer(model, X_train)
# Calculate SHAP values for a specific prediction
shap_values = explainer(X_test.iloc[0])

SHAP Universal Explainer

The SHAP library allows users to automatically select the most suitable explainer based on their model type. This makes it easier to calculate SHAP values for diverse data forms, enhancing interpretability across both structured and unstructured datasets. Below is an example of how to utilize SHAP’s universal explainer with Python.

import shap
# Initialize SHAP universal explainer
explainer = shap.Explainer(model, X_train)
# Compute SHAP values
shap_values = explainer(X_test)

Python SHAP Framework

SHAP (SHapley Additive exPlanations) provides insight into how features influence predictions in machine learning models. It’s applicable across various models, including linear regression and deep learning. Using SHAP, you quantify each feature’s contribution, leveraging game theory principles. The values are distributed among the features, where the sum of the SHAP values is equal to the difference in the model prediction and average prediction.

import shap
# Explain the model predictions using SHAP
explainer = shap.Explainer(model, X_train)
#SHAP values for a single row
shap_values = explainer(X_row)
shap_values_sum = shap_values.sum()
diff_model_predictions = model.predict(X_row.values.reshape(1,-1)) - model.predict(np.zeros(X_row.shape))
numpy.allclose(shap_values_sum, diff_model_predictions)

Python SHAP Additive

SHAP values distribute differences between the average and specific model predictions. This ensures explanations match the model output. In Python, you can easily calculate SHAP values with the shap library, making it an excellent tool for understanding model behavior. The sum of SHAP values equals the difference between the model’s average and specific predictions.

import shap
# Explain the model predictions using SHAP
explainer = shap.Explainer(model, X_train)
#SHAP values for a single row
shap_values = explainer(X_row)
shap_values_sum = shap_values.sum()
diff_model_predictions = model.predict(X_row.values.reshape(1,-1)) - model.predict(np.zeros(X_row.shape))
numpy.allclose(shap_values_sum, diff_model_predictions)

SHAP Explainers in Python

SHAP offers specialized explainers to interpret various ML models. With LinearExplainer for linear models, TreeExplainer for tree models, and DeepExplainer for neural networks, it provides versatility. The KernelExplainer enhances model-agnostic explanations, applicable across a wide array of models.

import shap
# Create universal SHAP explainer, which picks best explainer type automatically
explainer = shap.Explainer(model, X)
#Or choose specific SHAP explainers based on model type
explainer_linear = shap.LinearExplainer(model,X_train,centered,masker)
explainer_tree = shap.TreeExplainer(model, X_train)
explainer_kernel = shap.KernelExplainer(model.predict, X_train)
# Calculate SHAP values with chosen explainer
shap_values = explainer(X)

SHAP Classification Values in Python

SHAP values offer insight into model predictions by attributing the contribution of each feature to the output. For classification models, SHAP values can be presented in logits or probabilities. This provides clarity and transparency in understanding how individual inputs affect predictions.

import shap
model = LogisticRegression()
model.fit(X_train, y_train)
#Explain Classifier in Logits
explainer_logits = shap.Explainer(model, X_train)
shap_values_logits = explainer_kernel_c(X_test)
print('Local explanations:')
for feature, value in zip(model_c.feature_names_in_, shap_values_logits.values[0]):
print(f"{feature}: {value:.4f}")
#Explainer in probabilities
explainer_prob = shap.Explainer(model.predict_proba, X_train)
shap_values_prob = explainer_prob(X_test)
print('Local explanations:')
for feature, value in zip(model_c.feature_names_in_, shap_values_prob.values[0][:,1]):
print(f"{feature}: {value:.4f}")

Python SHAP Plots

SHAP visualizations like waterfall and force plots are powerful for understanding model predictions. They show how each feature contributes to a specific prediction by incrementally highlighting their positive and negative impacts. These visualizations help in interpreting how far predictions deviate from the average.

import shap
import matplotlib.pyplot as plt
explainer = shap.Explainer(model, X)
# Calculate SHAP values with chosen explainer
shap_values = explainer(X)
# Visualize the waterfall plot for a single instance
shap.plots.waterfall(shap_values[0])
# Visualize the force plot
shap.plots.force(shap_values[0])

SHAP in Python

SHAP (SHapley Additive exPlanations) enhances the interpretability of text classifiers by determining the significance of individual words or phrases in the model’s decisions. This makes text-based predictions clearer. Discover how this is done using the Python language with the provided code.

import shap
# Example text
text = "This coffee maker is fantastic! It’s easy to use, and my coffee tastes better than ever."
#Initialize SHAP's TextExplainer
explainer = shap.Explainer(classifier)
# Generate SHAP values for the first text
shap_values = explainer(text)
#plot the shap text explanations for the first text
shap.plots.text(shap_values, display=True)

SHAP in Python

SHAP stands for SHapley Additive exPlanations, a method to explain output by assigning each feature an importance value. In image classification, SHAP highlights which pixels influenced a model’s decision. While this review card explains SHAP use in Python, see the code example for implementation.

import shap
from transformers import AutoFeatureExtractor, AutoModelForImageClassification
from PIL import Image
import shap
import numpy as np
import matplotlib.pyplot as plt
# Step 1: Load the feature extractor and lightweight pre-trained model
model_name = "google/mobilenet_v2_1.0_224" # Lightweight model
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
model = AutoModelForImageClassification.from_pretrained(model_name)
def predict(images):
"""Prediction function for SHAP."""
inputs = feature_extractor(images=images, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probabilities = torch.nn.functional.softmax(logits, dim=1).numpy()
return probabilities
import torch
model.eval() # Set model to evaluation mode (important for SHAP)
# Step 3: Load and preprocess the image
image_path = "dog.4028.jpg"#"cat.jpg" # Replace with your image path
image = Image.open(image_path).convert("RGB") # Ensure the image is in RGB format
image = image.resize((224, 224)) # Resize to the model's expected input size
image_array = np.array(image)
# Step 4: Get class labels from the model
id2label = model.config.id2label # Hugging Face stores class names in model.config.id2label
labels = [id2label[i] for i in range(len(id2label))]
# Step 5: Get predictions and top-K classes
top_k = 5 # Number of top classes to display
probs = predict([image_array])[0] # Get probabilities for the image
top_k_indices = np.argsort(probs)[-top_k:][::-1] # Get indices of top-K classes
top_k_probs = probs[top_k_indices]
top_k_labels = [labels[i] for i in top_k_indices]
# Display the top-K classes and probabilities
print("Top-K Predictions:")
for i in range(top_k):
print(f"{i, top_k_labels[i]}: {top_k_probs[i]:.4f}")
# Step 6: Use SHAP Partition Explainer
explainer = shap.Explainer(predict, shap.maskers.Image("inpaint_telea", image_array.shape))
shap_values = explainer(np.array([image_array]), max_evals=50, outputs=top_k_indices)
shap.image_plot(shap_values, np.array([image_array]), labels=top_k_labels)

Python SHAP Plots

SHAP plots help visualize the importance and impact of individual features across a dataset. Features with higher SHAP values influence predictions more significantly. Using plots like bar or beeswarm, we can better understand model behavior.

import shap
import matplotlib.pyplot as plt
# Assuming you have a trained model and data
explainer = shap.Explainer(model, data)
shap_values = explainer(data)
# Create a bar plot
shap.plots.bar(shap_values)
# Create a beeswarm plot
shap.plots.beeswarm(shap_values)

Python SHAP Analysis

With SHAP values in regression models, you can see how each feature influences the prediction by moving it from the baseline (usually the mean prediction). Use these values to clearly explain a model’s outcomes in terms of the target variable’s units.

In the example below, each feature’s SHAP value (for a regression model) indicates how much that feature increases or decreases the target variable. Temp decreases the target variable (bike rentals) by 140, season increases by 582, and weekday decreases by 70.

import shap
# Explain the model predictions using SHAP
explainer = shap.Explainer(model, X_train)
#SHAP values for a single row
shap_values = explainer(X_row)
print('Local explanations:')
for feature, value in zip(model.feature_names_in_, shap_values):
print(f"{feature}: {value:.4f}")
```Local explanations:
temp: -140.1314
season: 581.9802
weekday: -69.7472```

Learn more on Codecademy