Nearest Neighbor:
- memorize all of the training data
- take some new images and try to find the most similar image in the training data to that new image and predict the label of that most similar image Distance Metric to compare images:
import numpy as np
class NearestNeighbor:
def __init__(self):
pass
def train(self,X,y):
"""X is N*D where each row is an example. Y is 1-dimension of size N"""
# the nearest neighbor classifier simply remembers all the training data
self.Xtr=X
self.ytr=y
def predict(self,X):
"""X is N*D where each row is an example we wish to predict label for"""
num_test=X.shape[0]
# lets make sure that the output type matches the input type
Ypred=np.zeros(num_test,dtype=self.ytr.dtype)
# loop over all test rows
for i in xrange(num_test):
# find the nearest training image to the i'th test image
# using the L1 distance (sum of absolute value differences)
distances=np.sum(np.abs(self.Xtr-X[i,:]),axis=1)
min_index=np.argmin(distances)# get the index with smallest distance
Ypred[i]=self.ytr[min_index]# predict the label of the nearest example
return Ypred
Train O(1) predict O(N)
This is bad: we want classifiers that are fast at prediction; slow for training is ok
K-Nearest Neighbors:
Rather than looking for the single nearest neighbor, we'll find K of our nearest neighbors according to our distance metric and then take a vote among each of our neighbors. Then predict the majority vote among our neighbors.
K-Nearest neighbors: Distance Metric
L1(Manhattan) Distance:
L2(Euclidean) Distance:
The value of K and distance are hyperparameters: choices about the algorithm that we set rather than learn.
Split data into train, validation and test; choose hyperparameters on validation and evaluate on test.
K-Nearest Neighbor on images never used.
- very slow at test time
- distance metric on pixels are not informative