Loading…

Correlation analysis of performance measures for multi-label classification

•Evaluation measures have been used arbitrarily in multilabel classification experiments, without an objective analysis of correlation or bias.•A comprehensive and detailed analysis of the correlation that exists between multilabel measures is presented.•Hamming Loss is a highly recommended measure,...

Full description

Saved in:
Bibliographic Details
Published in:Information processing & management 2018-05, Vol.54 (3), p.359-369
Main Authors: Pereira, Rafael B., Plastino, Alexandre, Zadrozny, Bianca, Merschmann, Luiz H.C.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Evaluation measures have been used arbitrarily in multilabel classification experiments, without an objective analysis of correlation or bias.•A comprehensive and detailed analysis of the correlation that exists between multilabel measures is presented.•Hamming Loss is a highly recommended measure, as it is not correlated with others and it is the most employed measure in the literature.•At least 12 out of 16 multilabel measures adopted in the literature are highly correlated with each other. In many important application domains, such as text categorization, scene classification, biomolecular analysis and medical diagnosis, examples are naturally associated with more than one class label, giving rise to multi-label classification problems. This fact has led, in recent years, to a substantial amount of research in multi-label classification. In order to evaluate and compare multi-label classifiers, researchers have adapted evaluation measures from the single-label paradigm, like Precision and Recall; and also have developed many different measures specifically for the multi-label paradigm, like Hamming Loss and Subset Accuracy. However, these evaluation measures have been used arbitrarily in multi-label classification experiments, without an objective analysis of correlation or bias. This can lead to misleading conclusions, as the experimental results may appear to favor a specific behavior depending on the subset of measures chosen. Also, as different papers in the area currently employ distinct subsets of measures, it is difficult to compare results across papers. In this work, we provide a thorough analysis of multi-label evaluation measures, and we give concrete suggestions for researchers to make an informed decision when choosing evaluation measures for multi-label classification.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2018.01.002