Loading…

New Method Based Pre-Processing to Tackle Missing and High Dimensional Data of CRISP-DM Approach

The kidneys are one of the most important organs including the excretion system in humans. The kidneys are responsible for maintaining blood concentrations to remain constant (homeostatic) and help to control blood pressure (BP). If the task of the kidney is not functioning properly it will cause ki...

Full description

Saved in:
Bibliographic Details
Published in:Journal of physics. Conference series 2020-02, Vol.1471 (1), p.12012
Main Authors: Suntoro, Joko, Ilham, Ahmad, Rani, Handini Arga Damar
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The kidneys are one of the most important organs including the excretion system in humans. The kidneys are responsible for maintaining blood concentrations to remain constant (homeostatic) and help to control blood pressure (BP). If the task of the kidney is not functioning properly it will cause kidney failure. In the past decade, data mining methods have been used to diagnose kidney failure. The dataset used to predict kidney failure was successfully summarized by Soundarapandian, and was named the Chronic Kidney Disease (CKD) dataset. But the data in the CKD dataset contains missing value and high dimension data (original data) so that it affects the evaluation results on classification. This research proposes methods in preprocessing data, namely modus in every class (MEC) method to solve missing value problems, and the weight information gain (WIG) method for solving high dimensional data problems, the proposed method is named the MEC + WIG method. The MEC + WIG method will be compared with the original method and the MEC method and evaluated based on the accuracy of the traditional classification method (k-NN, Naïve Bayes, C4.5, and CART). The results showed that the average accuracy of the MEC + WIG method was better than the original method and the MEC method, with the average accuracy of the MEC + WIG method at 98.13%, while the average value of the accuracy of the original method and MEC respectively amounting to 88.56% and 92.88%. There were significant differences between the three methods when tested using Friedman test with a p-value of 0.02. It can be concluded that the MEC + WIG method can improve the performance of traditional methods k-NN, Naive Bayes, C4.5 and CART by overcoming the problem of missing value and data high dimension.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/1471/1/012012