Loading…

SMOTE technique in comparison of gradient boosted trees with Naive Bayes for adaboost implementation in imbalance problem solving class on village status prediction

The Ministry of Villages PDTT in collaboration with the National Development Planning Agency and the Central Statistics Agency issued Village Potential data in 2015 (Podes 2015) consisting of 74093 villages and having 42 indicators/attributes with podes status used as labels, but in the historical d...

Full description

Saved in:
Bibliographic Details
Main Authors: Firmansyah, Hasbi, Budiraharjo, Eko, Sofyan, Ali
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The Ministry of Villages PDTT in collaboration with the National Development Planning Agency and the Central Statistics Agency issued Village Potential data in 2015 (Podes 2015) consisting of 74093 villages and having 42 indicators/attributes with podes status used as labels, but in the historical data there is a problem with the dataset owned by Podes 2015 identified as having data with class data to unbalanced an. According to data problems such as class imbalances can affect the performance of the algorithm to be poor, this is because if the minority data class is smaller or lower than the majority data class, the prediction results will be more inclined to the majority data class. The gradient boosted decision tree method has the advantage of good performance in handling classifications that combine simple parameter functions with ’bad’ results (high prediction errors) to create highly accurate predictions. However, the gradient boosted decision tree algorithm has a disadvantage that it cannot be applied to the problem of regression from a small distribution of data, therefore it takes large data to use algorithms, where complex interactions will be modelled simply. To solve the problem can be done with a method that can balance the class and improve accuracy. Adaboost is one of the boosting methods that is able to balance classes by giving weight to the level of classification errors that can change the distribution of data while SMOTE is a well-known method applied in order to deal with class imbalances. This technique synthesizes a new sample of minority classes to balance the dataset by creating a new instance of the minority class with the formation of a consolidation convex of adjacent instances. Experiments were carried out by applying the adaboost method to the gradient boosted decision tree (GBDT) to get optimal results and a good level of accuracy. The experimental results obtained from the proposed method are the SMOTE technique on the gradient boosted decision tree with ada boost for accuracy of 8 8.91%, classification error of 11.09% compared to the naïve bayes algorithm where only get accuracy 40.16%, classification error 59.84. On the measurement of Classification Error. Finally, in kappa measurements, it can be concluded in determining the status of villages using the smote technique method for gradient boosted decision trees and adaboost proven to be able to solve class imbalance problems and increase high accuracy and can reduce the c
ISSN:0094-243X
1551-7616
DOI:10.1063/5.0211898