Biclustering
Biclustering is a form of unsupervised machine learning that takes a data matrix and groups both the rows and columns of this matrix to unveil previously unknown patterns. It’s standard in gene expression, text mining, and other recommendation systems and captures more localized relationships than the general clustering method. Scikit-learn provides spectral co-clustering and diagonal biclustering algorithms, implemented as classes with a fit method, enabling efficient pattern discovery in complex datasets.
Syntax
Here’s a syntax that shows the implementation of biclustering using sklearn:
from sklearn.cluster import SpectralCoclustering, SpectralBiclustering
# For Spectral Co-clustering
model = SpectralCoclustering(n_clusters=number_of_biclusters, random_state=seed)
model.fit(data_matrix)
# For Spectral Bi-clustering
model = SpectralBiclustering(n_clusters=number_of_biclusters, method="log", random_state=seed)
model.fit(data_matrix)
n_clusters
: Number of biclusters to create.random_state
: Ensures the randomness for reproducible results.method
(For SpectralBiclustering): Specifies the algorithm variant, e.g.,log
orbistochastic
. Thelog
method applies logarithmic scaling, whilebistochastic
normalizes rows and columns. The choice of method can affect the results depending on the dataset.
Note: Since Bicluster is not directly available in sklearn, alternative methods for biclustering, such as
SpectralBiclustering
, can be used.
Example
Here’s an example of implementing biclustering using SpectralBiclustering
from sklearn:
import numpy as npfrom sklearn.cluster import SpectralBiclustering# Sample data matrixdata_matrix = np.array([[1, 1, 0, 0],[1, 1, 0, 0],[0, 0, 1, 1],[0, 0, 1, 1]])# Apply Spectral Biclusteringmodel = SpectralBiclustering(n_clusters=2, random_state=42)model.fit(data_matrix)# Get the bicluster labels for rows and columnsrow_labels = model.rows_column_labels = model.columns_# Print biclustersprint("Row Biclusters:", row_labels)print("Column Biclusters:", column_labels)
The above code results in the following output:
Row Biclusters: [[False False True True][False False True True][ True True False False][ True True False False]]Column Biclusters: [[False False True True][ True True False False][False False True True][ True True False False]]
- In the Row Biclusters,
True
in a position means that the corresponding row is part of the bicluster. - Similarly, in the Column Biclusters,
True
indicates that the corresponding column is part of the bicluster.
Codebyte Example
Here the example demonstrates how to perform Spectral Biclustering on a simple 6x6 binary data matrix using SpectralBiclustering
from sklearn
:
All contributors
- Anonymous contributor
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Sklearn on Codecademy
- Career path
Data Scientist: Machine Learning Specialist
Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.Includes 27 CoursesWith Professional CertificationBeginner Friendly90 hours - Course
Learn Python 3
Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.With CertificateBeginner Friendly23 hours