You’ve now written your own K-Nearest Neighbor classifier from scratch! However, rather than writing your own classifier every time, you can use Python’s
sklearn is a Python library specifically used for Machine Learning. It has an amazing number of features, but for now, we’re only going to investigate its K-Nearest Neighbor classifier.
There are a couple of steps we’ll need to go through in order to use the library. First, you need to create a
KNeighborsClassifier object. This object takes one parameter -
k. For example, the code below will create a classifier where
k = 3
classifier = KNeighborsClassifier(n_neighbors = 3)
Next, we’ll need to train our classifier. The
.fit() method takes two parameters. The first is a list of points, and the second is the labels associated with those points. So for our movie example, we might have something like this
training_points = [ [0.5, 0.2, 0.1], [0.9, 0.7, 0.3], [0.4, 0.5, 0.7] ] training_labels = [0, 1, 1] classifier.fit(training_points, training_labels)
Finally, after training the model, we can classify new points. The
.predict() method takes a list of points that you want to classify. It returns a list of its guesses for those points.
unknown_points = [ [0.2, 0.1, 0.7], [0.4, 0.7, 0.6], [0.5, 0.8, 0.1] ] guesses = classifier.predict(unknown_points)
sklearn for you. Create a KNeighborsClassifier named
classifier that uses
We’ve also imported some movie data. Train your classifier using
movie_dataset as the training points and
labels as the training labels.
Let’s classify some movies. Classify the following movies:
[.45, .2, .5],
[.25, .8, .9],
[.1, .1, .9]. Print the classifications!
Which movies were classified as good movies and which were classified as bad movies?
Remember, those three numbers associated with a movie are the normalized budget, run time, and year of release.