Loading…

Feature selection based on difference and similitude in data mining

Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in...

Full description

Saved in:
Bibliographic Details
Published in:Wuhan University journal of natural sciences 2007-05, Vol.12 (3), p.467-470
Main Authors: Wu, Ming, Yan, Puliu
Format: Article
Language:chi ; eng
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix; the result may not be the simplest rules. Although difference-similitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(|C||U|).
ISSN:1007-1202
1993-4998
DOI:10.1007/s11859-006-0077-2