Multilabel Classification
In sklearn, Multilabel Classification assigns multiple labels to a single instance, allowing models to predict multiple outputs simultaneously. This method differs from traditional classification, where each instance belongs to only one class.
Scikit-learn offers tools like OneVsRestClassifier
, ClassifierChain
, and MultiOutputClassifier
to handle multilabel classification and enable efficient model training and evaluation.
Syntax
Here’s the syntax for using multiabel classification in sklearn:
from sklearn.multioutput import MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Step 1: Initialize the base classifier
base_model = RandomForestClassifier(random_state=42)
# Step 2: Create a MultiOutputClassifier wrapper for multilabel classification
multi_label_model = MultiOutputClassifier(base_model)
# Step 3: Train the model using the training dataset
multi_label_model.fit(X_train, y_train)
# Step 4: Make predictions on the test dataset
predicted_labels = multi_label_model.predict(X_test)
# Step 5: Evaluate predictions or use the results
print(predicted_labels)
RandomForestClassifier
: The base classifier for multilabel classification.MultiOutputClassifier
: A wrapper to extend the base classifier for multilabel tasks.Training and testing
: The model is trained withfit()
and predictions are made usingpredict()
.
Example
This code demonstrates multilabel classification using scikit-learn by training a model to assign multiple labels:
from sklearn.datasets import make_multilabel_classificationfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.multioutput import MultiOutputClassifierfrom sklearn.metrics import classification_report# Generate synthetic multilabel dataX, y = make_multilabel_classification(n_samples=100, n_features=10, n_classes=3, n_labels=2, random_state=42)# Initialize a base classifierbase_classifier = RandomForestClassifier()# Wrap the base classifier for multilabel classificationmodel = MultiOutputClassifier(base_classifier)# Train the modelmodel.fit(X, y)# Predict labels for new datapredictions = model.predict(X[:5])# Display predictionsprint("Predicted Labels for First 5 Samples:")print(predictions)
The code results the following output:
Predicted Labels for First 5 Samples:[[1 1 0][1 1 0][0 0 1][1 1 1][0 1 0]]
Codebyte Example
The following codebyte example trains a Random Forest classifier for multilabel classification on dataset and predicts multiple categories for new samples:
All contributors
- Anonymous contributor
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Sklearn on Codecademy
- Skill path
Intermediate Machine Learning
Level up your machine learning skills with tuning methods, advanced models, and dimensionality reduction.Includes 5 CoursesWith CertificateIntermediate8 hours - Course
Learn Python 3
Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.With CertificateBeginner Friendly23 hours