L1 (Manhattan) Distance
- Depends on your choice of coordinate system
- If you were to rotate the coordinate frame, that would actually change the L1 distance between the points
- If your input features, individual vectors have some important meaning for your task, L1 might be a more natural fit
L2 (Euclidean) distance
The L2 Distance has the geometric interpretation of computing the euclidean distance between two vectors. The distance takes the form:
- If you were to rotate the coordinate frame, it would not affect the L2 distance between points
By using different distance metrics, we can generalize the K-Nearest neighbors classifier to many different types of data - not just vectors or images. For example if you wanted to classify pieces of text - you would have to specify a distance function that measures distances between two paragraphs or two sentences (something like that). By specifying different distance metrics, we can apply this algorithm pretty generally to basically any type of data.
Q/A:
- Q: What is actually happening geometrically when you choose different distance metrics?
- A:
- With L1, decision boundaries tend to follow the coordinate axes, because L1 depends on choice of coordinate system.
- L2 doesn’t care about choice of coordinate system, it just puts the boundaries where it thinks they should fall naturally
- Q: Where would L1 distance be preferable to using L2 distance?
- A: Problem dependent. Since L1 is dependent on coordinates of your data, L1 would be good if you know you have a vector, and the individual elements of that vector have meaning
- Classifying employees and diff elements of that vector correspond to diff features/aspects of an employee (Salary, Years worked).
- This is a hyperparameter - so the best answer is to try both L1 and L2 and see which works betters.
- A: Problem dependent. Since L1 is dependent on coordinates of your data, L1 would be good if you know you have a vector, and the individual elements of that vector have meaning