Performance comparison of k nearest neighbor classifier with different distance functions

In the field of pattern recognition, K Nearest Neighbor is the classifier algorithm that use distance function to measure similarity between two samples. The well-known distance function used is the Euclidean distance which sees all samples including noisy or outliers with equal important. Euclidean...

Full description

Bibliographic Details
Published in:AIP Conference Proceedings
Main Author: Mukahar N.
Format: Conference paper
Language:English
Published: American Institute of Physics 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85188428801&doi=10.1063%2f5.0192229&partnerID=40&md5=a5c61175518729d3d2350a9798e65ad9
Description
Summary:In the field of pattern recognition, K Nearest Neighbor is the classifier algorithm that use distance function to measure similarity between two samples. The well-known distance function used is the Euclidean distance which sees all samples including noisy or outliers with equal important. Euclidean distance is highly influenced by the noisy sample or outliers, and the value returned by similarity metrics may be affected which in turn it will deteriorate the classification performance. This paper conducts experimental comparisons of several distance functions in the KNN classification including Manhattan, Angular, Chebyshev, Cosine, Euclidean, Histogram, Kalmogorov, Mahalanobis, Match and Minkowski. Evaluation of the distance function are made on the 31 selected real-world datasets of different natures from UCI repository and the results show that Manhattan performs better over other distance functions by achieving classification accuracy at 84.63%. © 2024 AIP Publishing LLC.
ISSN:0094243X
DOI:10.1063/5.0192229