Loading…

A novel extreme learning machine based kNN classification method for dealing with big data

•Some ELMs are used in parallel as experts to map data into discriminative space.•The training data are easily grouped into some clusters in the new feature space.•An index is calculated per group to help finding corresponding groups of a sample.•A tree is used to find corresponding groups of a samp...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2021-11, Vol.183, p.115293, Article 115293
Main Authors: Shokrzade, Amin, Ramezani, Mohsen, Akhlaghian Tab, Fardin, Abdulla Mohammad, Mahmud
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Some ELMs are used in parallel as experts to map data into discriminative space.•The training data are easily grouped into some clusters in the new feature space.•An index is calculated per group to help finding corresponding groups of a sample.•A tree is used to find corresponding groups of a sample and applying kNN on it.•The method is evaluated on 4 big datasets in absence and presence of noise. kNN algorithm, as an effective data mining technique, is always attended for supervised classification. On the other hand, the previously proposed kNN finding methods cannot be considered as efficient methods for dealing with big data. As there is daily generated and expanded big datasets on different online and offline servers, the efficient methods for such data must be introduced to find kNN. Moreover, massive amounts of data contain more noise and imperfection data samples that significantly increase the need for a robust kNN finding method. In this paper, a new fast and robust kNN finding framework is introduced to deal with the big datasets. In this method, a group of most relevant data samples to an input data sample are detected and the original kNN method is applied on them for finding the final nearest neighbors. The main goal of this method is dealing with the big datasets in an accurate, fast, and robust manner. Here, the training data samples of each label are grouped into some partitions based on the output of some mini-classifiers (i.e. ELM classifier). In fact, the behavior of the mini-classifiers is the basis of partitioning the training data samples. These mini-classifiers are trained using non-overlapping subsets of the training set in the form of each mini-classifier a subset. Here, an index is calculated for each partition to make the corresponding partition finding faster using a tree structure in which each partition index is fallen into a leaf. Then, outputs of the mini-classifiers for an input test sample are used to find the corresponding group of most relevant data samples to the input data sample on the tree. Experimental results indicate that the proposed method has better performance in most cases and comparable performance on other cases of original and noisy big data problems.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2021.115293