Loading…

Nearest neighbor classification of categorical data by attributes weighting

•An effective solution for nearest neighbor classification on categorical data.•Two global attribute-weighting approaches applied for categorical data classification.•Two local attribute-weighting approaches applied for categorical data classification.•Strong results of the new classifiers compared...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2015-04, Vol.42 (6), p.3142-3149
Main Authors: Chen, Lifei, Guo, Gongde
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•An effective solution for nearest neighbor classification on categorical data.•Two global attribute-weighting approaches applied for categorical data classification.•Two local attribute-weighting approaches applied for categorical data classification.•Strong results of the new classifiers compared with the traditional kNN and the decision tree.•Detailed analysis on the different behaviors of the various attribute-weighting methods. Subspace classification of categorical data is an essential process for many real-world applications such as computer-aided medical diagnosis and collaborative recommendation. The nearest neighbor classifiers have sparked wide interest from these applications because of their simplicity and flexibility. However, they become ineffective when applied to categorical data, due to the lack of a well-defined distance measure used to compute dissimilarities between categorical samples in the projected subspaces. In this paper, we tackle the problem by defining a series of weighted distance functions for categorical attributes, and applying them to derive new nearest neighbor classifiers. Four attribute-weighting measures are proposed, with two defined on global feature-ranking approaches while the other two on local approaches. The experimental results conducted on real categorical data sets demonstrate that all four classifiers outperform consistently the traditional methods, and show the suitability of the proposal for the real applications in terms of automated feature selection.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2014.12.002