Python:Sklearn Nearest Neighbors
In Scikit-learn, Nearest Neighbors is an essential algorithm for finding the closest data points in a dataset based on a defined distance metric. It supports tasks like classification, regression, and clustering by calculating similarities between data points. The algorithm is easy to implement and highly adaptable, making it a go-to choice for various machine learning applications.
Sklearn’s implementation offers several optimization techniques, including ball_tree and kd_tree, to enhance performance, especially for large datasets. Users can customize parameters like n_neighbors, metric, and algorithm to fine-tune the model for specific use cases. This versatility makes Nearest Neighbors particularly valuable in recommendation systems, anomaly detection, and exploratory data analysis.
Syntax
from sklearn.neighbors import NearestNeighbors
nbrs = NearestNeighbors(n_neighbors=5, algorithm='auto', metric='minkowski', p=2)
n_neighbors: The number of nearest neighbors to find for each data point.algorithm: Algorithm used to compute the nearest neighbors.'auto': Automatically selects the best algorithm.'ball_tree': Uses a Ball Tree structure.'kd_tree': Uses a KD Tree structure.'brute': Performs brute-force search.
metric: The distance metric to use.'euclidean': Standard Euclidean distance.'manhattan': Manhattan (L1) distance.'minkowski': Generalized Minkowski distance (requiresp).
p: Power parameter for the Minkowski metric.p=1: Equivalent to Manhattan distance.p=2: Equivalent to Euclidean distance.
Example
The following example uses Sklearn’s NearestNeighbors to find and display the indices and distances of the 2 nearest neighbors for each data point in a sample dataset:
from sklearn.neighbors import NearestNeighborsimport numpy as np# Sample dataset: Each row represents a data pointX = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])# Initialize the Nearest Neighbors modelnbrs = NearestNeighbors(n_neighbors=2, algorithm='auto')# Fit the model with the datanbrs.fit(X)# Query the Nearest Neighbors for the given datasetdistances, indices = nbrs.kneighbors(X)# Output the resultsprint("Indices of Nearest Neighbors:")print(indices) # Indices of the closest neighbors in the datasetprint("\nDistances to Nearest Neighbors:")print(distances) # Corresponding distances to those neighbors
The code above produces the following possible output:
Indices of Nearest Neighbors:[[0 1 2][1 0 2][2 1 3][3 2 1]]Distances to Nearest Neighbors:[[0. 4.24264069 7.81024968][0. 4.24264069 3.60555128][0. 4.24264069 3.16227766][0. 4.24264069 3.60555128]]
Contribute to Docs
- Learn more about how to get involved.
- Edit this page on GitHub to fix an error or make an improvement.
- Submit feedback to let us know how we can improve Docs.
Learn Python:Sklearn on Codecademy
- Machine Learning Data Scientists solve problems at scale, make predictions, find patterns, and more! They use Python, SQL, and algorithms.
- Includes 27 Courses
- With Professional Certification
- Beginner Friendly.95 hours
- Learn the basics of Python 3.12, one of the most powerful, versatile, and in-demand programming languages today.
- With Certificate
- Beginner Friendly.24 hours