Loading…
A hybrid approach for mismatch data reduction in datasets and guide data mining
An outlier is a set of data that distinctly differ from rest of the data in a dataset defined as normal. Detection of outlier is an active area of research in data mining. If clustering methods are used, the elements that are lying outside the clusters are focused and detected as outliers. But it is...
Saved in:
Published in: | Cluster computing 2019-09, Vol.22 (Suppl 5), p.10605-10614 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | An outlier is a set of data that distinctly differ from rest of the data in a dataset defined as normal. Detection of outlier is an active area of research in data mining. If clustering methods are used, the elements that are lying outside the clusters are focused and detected as outliers. But it is not true few unknown elements will become a part of the cluster. So to ignore the irrelevant data completely from the data set, it becomes necessary to identify and eliminate these data merged with the clusters. An efficient hybrid approach is proposed to reduce the number of outliers. Two algorithms namely multilayer neural networks (MLN) and weighted-K means adopted for datamining are employed in proposed approach to identify outliers in a data group. This approach guides and results in better cluster formation. Each element in the dataset provided as input to MLN after assigning weights by weighted K-means. MLN is trained to reproduce the normal input data (inliers) and ensures that groups formed by weighted K-means are consisting of inliers only. Among the outlier detection methods presented in literature for outlier detection in data mining, the proposed method is based on Integrating Semantic Knowledge. This method relates the data point is an outlier by identifying the behaviour of the data elements that differ from other data elements belonging to the same cluster or class. The principle intention of this research work is to reduce the amount of outliers by enhancing the performance of clustering or classification techniques that guides to improve accuracy and reduce the mean square error. The test results provides evident to supremacy of the proposed strategy in reducing the outlier. |
---|---|
ISSN: | 1386-7857 1573-7543 |
DOI: | 10.1007/s10586-017-1137-4 |