Loading…
How to measure similarity for multiple categorical data sets?
How to measure similarity or distance for multiple categorical data? It is an important step for Data Mining and Knowledge Management process to measure similarity or distance between objects appropriately. Measurements for continuous data have been well-defined and relatively easy to be calculated....
Saved in:
Published in: | Multimedia tools and applications 2015-05, Vol.74 (10), p.3489-3505 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | How to measure similarity or distance for multiple categorical data? It is an important step for Data Mining and Knowledge Management process to measure similarity or distance between objects appropriately. Measurements for continuous data have been well-defined and relatively easy to be calculated. However, the notion of similarity for categorical data is not simple, since categorical data usually is not simply translated into the numerical format, and they also have their own priority with structures and data distribution. In this paper, we propose a new measure for multiple categorical data sets using data distribution. Our new measure, MCSM (Multiple Categorical Similarity Measure), can solve conventional drawbacks of multiple categorical data sets successfully in which we prove the verification of our measure with mathematical proofs and experimentation. The experimental result shows that our measure is powerful for multiple categorical data sets with proper data distributions. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-014-1914-5 |