Loading…
Nearest neighbor classification of categorical data by attributes weighting
•An effective solution for nearest neighbor classification on categorical data.•Two global attribute-weighting approaches applied for categorical data classification.•Two local attribute-weighting approaches applied for categorical data classification.•Strong results of the new classifiers compared...
Saved in:
Published in: | Expert systems with applications 2015-04, Vol.42 (6), p.3142-3149 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •An effective solution for nearest neighbor classification on categorical data.•Two global attribute-weighting approaches applied for categorical data classification.•Two local attribute-weighting approaches applied for categorical data classification.•Strong results of the new classifiers compared with the traditional kNN and the decision tree.•Detailed analysis on the different behaviors of the various attribute-weighting methods.
Subspace classification of categorical data is an essential process for many real-world applications such as computer-aided medical diagnosis and collaborative recommendation. The nearest neighbor classifiers have sparked wide interest from these applications because of their simplicity and flexibility. However, they become ineffective when applied to categorical data, due to the lack of a well-defined distance measure used to compute dissimilarities between categorical samples in the projected subspaces. In this paper, we tackle the problem by defining a series of weighted distance functions for categorical attributes, and applying them to derive new nearest neighbor classifiers. Four attribute-weighting measures are proposed, with two defined on global feature-ranking approaches while the other two on local approaches. The experimental results conducted on real categorical data sets demonstrate that all four classifiers outperform consistently the traditional methods, and show the suitability of the proposal for the real applications in terms of automated feature selection. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2014.12.002 |