Loading…

Do all roads lead to Rome? Studying distance measures in the context of machine learning

•Review of the most commonly used distance measures in machine learning•Analysis of their main properties, applications and key aspects to consider•The similarity analysis shows a high degree of correlation between all the measures•Evaluation of classification and clustering performance, noise toler...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition 2023-09, Vol.141, p.109646, Article 109646
Main Authors: Blanco-Mallo, Eva, Morán-Fernández, Laura, Remeseiro, Beatriz, Bolón-Canedo, Verónica
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Review of the most commonly used distance measures in machine learning•Analysis of their main properties, applications and key aspects to consider•The similarity analysis shows a high degree of correlation between all the measures•Evaluation of classification and clustering performance, noise tolerance and runtime•Canberra distance shows the best overall performance and the highest tolerance to noise Many machine learning and data mining tasks are based on distance measures, so a large amount of literature addresses this aspect somehow. Due to the broad scope of the topic, this paper aims to provide an overview of the use of these measures in the most common machine learning problems, pointing out those aspects to consider to choose the most appropriate measure for a particular task. For this purpose, the most recent works addressing the subject were reviewed and seven of the most commonly used measures were analyzed, investigating in detail their main properties and applications. Different experiments were carried out to study their relationships and compare their performance. The degradation of the results in the presence of noise was also considered, as well as the execution time required by each measure.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2023.109646