Challenges for image recognition:
- Viewpoint variation
- Scale variation
- Illumination conditions
- Background clutter
- Intra-class variation
Data-driven approach means not build exact algorithm to separate a data, but provide a dataset with correct labels and learn and algorithm by example.
Images may be compared by L1 distance, where the sum is taken over all pixels.:
or with L2 distance, for example:
In code simple classifier can be implemented as:
class NearestNeighbor: def train(self, data, labels): self.Xtr = data self.Ytr = labels def predict(self, X): """X array of images""" predictions =  for idx in range(X.shape): # compute L1 distance distance = np.sum(np.abs(self.Xtr - X[idx, :]), axis=1) min_index = np.argmin(distances) prediction.append(self.Ytr[min_index]) return np.array(predictions)
L1 vs L2. In particular, the L2 distance is much more unforgiving than the L1 distance when it comes to differences between two vectors. That is, the L2 distance prefers many medium disagreements to one big one.
The same as nearest neighbor classifier, but instead of finding the single closest image in the training set, we will find the top k closest images, and have them vote on the label of the test image.
Cross validation means not freeze train and validation datasets, but split them on k folds and perform k runs with validation set to various split each run.