Loading…

A New Distance Metric for Unsupervised Learning of Categorical Data

Distance metric is the basis of many learning algorithms, and its effectiveness usually has a significant influence on the learning results. In general, measuring distance for numerical data is a tractable task, but it could be a nontrivial problem for categorical data sets. This paper, therefore, p...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems 2016-05, Vol.27 (5), p.1065-1079
Main Authors:	Jia, Hong, Cheung, Yiu-Ming, Liu, Jiming
Format:	Article
Language:	English
Subjects:	Atmospheric measurements Attribute interdependence categorical attribute clustering analysis Context distance metric Frequency measurement Hamming distance Unsupervised learning
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Distance metric is the basis of many learning algorithms, and its effectiveness usually has a significant influence on the learning results. In general, measuring distance for numerical data is a tractable task, but it could be a nontrivial problem for categorical data sets. This paper, therefore, presents a new distance metric for categorical data based on the characteristics of categorical values. In particular, the distance between two values from one attribute measured by this metric is determined by both the frequency probabilities of these two values and the values of other attributes that have high interdependence with the calculated one. Dynamic attribute weight is further designed to adjust the contribution of each attribute-distance to the distance between the whole data objects. Promising experimental results on different real data sets have shown the effectiveness of the proposed distance metric.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2015.2436432