Loading…

How to measure similarity for multiple categorical data sets?

How to measure similarity or distance for multiple categorical data? It is an important step for Data Mining and Knowledge Management process to measure similarity or distance between objects appropriately. Measurements for continuous data have been well-defined and relatively easy to be calculated....

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2015-05, Vol.74 (10), p.3489-3505
Main Authors: Park, Simon Soon-Hyoung, Song, Justin JongSu, Lee, James Jung-Hoon, Lee, Wookey, Ree, Sangbok
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:How to measure similarity or distance for multiple categorical data? It is an important step for Data Mining and Knowledge Management process to measure similarity or distance between objects appropriately. Measurements for continuous data have been well-defined and relatively easy to be calculated. However, the notion of similarity for categorical data is not simple, since categorical data usually is not simply translated into the numerical format, and they also have their own priority with structures and data distribution. In this paper, we propose a new measure for multiple categorical data sets using data distribution. Our new measure, MCSM (Multiple Categorical Similarity Measure), can solve conventional drawbacks of multiple categorical data sets successfully in which we prove the verification of our measure with mathematical proofs and experimentation. The experimental result shows that our measure is powerful for multiple categorical data sets with proper data distributions.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-014-1914-5