Loading…

Collaboration graph for feature set partitioning in data classification

•A measure defined to show the effectiveness of each two features in classification.•Collaboration Graph (CG) represents the measure as an edge between each two features.•Community detection is used on CG to specify informative feature subsets.•The approach has been tested successfully on real and s...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2023-03, Vol.213, p.118988, Article 118988
Main Authors: Taheri, Khalil, Moradi, Hadi, Tavassolipour, Mostafa
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A measure defined to show the effectiveness of each two features in classification.•Collaboration Graph (CG) represents the measure as an edge between each two features.•Community detection is used on CG to specify informative feature subsets.•The approach has been tested successfully on real and synthetic data. The curse of dimensionality of features in data classification is still an open issue. An approach to solve this problem is to partition features into several sub-sets of features hence the data classification task for every subset is performed. Then, an ensemble of these classifications are reported as the result of the classification problem. However, the feature set partitioning into sub-sets of features is still an area of research interest. Thus, in this paper, an innovative framework is proposed in which, first, a collaboration measure between each two features is defined and measured. Then, the collaboration graph, consisted of features as nodes and measured collaborations as edges’ weights, is generated according to the collaboration measures calculated. After that, a community detection method is used to find the graph communities. The communities are considered as the feature subsets and a base classifier is trained for each subset based on the corresponding training data of the subsets. Then, the ensemble classifier is created by a combination of base classifiers according to the AdaBoost Aggreagation. The simulation results of the proposed approach over the real and synthetic datasets indicate that the proposed approach considerably increases the classification accuracy in comparison to previous methods.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.118988