Multiclass Classification
In Sklearn, Multiclass Classification is a supervised machine learning task where instances are categorized into one of three or more distinct classes. Unlike binary classification, which involves two classes, multiclass classification requires the model to differentiate among multiple categories.
Multiclass classification in Sklearn is implemented using algorithms such as Decision Trees
, Support Vector Machines (SVMs)
, and Logistic Regression
. These algorithms handle multiple classes through strategies like One-vs-Rest (OvR) or One-vs-One (OvO), depending on the model and configuration.
Note: Sklearn offers many algorithms for multi-class classification.
Syntax
Sklearn offers a variety of algorithms for multiclass classification. Below is an example syntax for performing multiclass classification using RandomForestClassifier
in sklearn:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier # Replace with your classifier
from sklearn.metrics import classification_report
# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=3, random_state=42)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create the classifier (can be any model that supports multiclass classification)
clf = RandomForestClassifier(random_state=42)
# Fit the model
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate the model
print(classification_report(y_test, y_pred))
Example
The following example code loads the iris
dataset, split it into training and testing sets (80% training, 20% testing), then train a RandomForestClassifier
, make predictions on the test data, calculates and prints the accuracy of the model:
from sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.metrics import accuracy_score# Load the Iris dataset (for multiclass classification)data = load_iris()X, y = data.data, data.target# Split the dataset into training and testing sets (80% train, 20% test)X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Initialize the RandomForestClassifiermodel = RandomForestClassifier()# Train the model on the training datamodel.fit(X_train, y_train)# Make predictions on the test datay_pred = model.predict(X_test)# Evaluate the model by calculating accuracyaccuracy = accuracy_score(y_test, y_pred)# Print the accuracy of the modelprint(f"Accuracy: {accuracy:.2f}")
The code outputs the following output:
Accuracy: 1.00
Codebyte Example
The following codebyte example trains a Random Forest classifier
for multiclass classification on synthetic data and predicts the category of a new product:
All contributors
- Anonymous contributor
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Sklearn on Codecademy
- Skill path
Intermediate Machine Learning
Level up your machine learning skills with tuning methods, advanced models, and dimensionality reduction.Includes 5 CoursesWith CertificateIntermediate8 hours - Course
Learn Python 3
Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.With CertificateBeginner Friendly23 hours