Loading…
An Improved Random Forest Algorithm for Class-Imbalanced Data Classification and its Application in PAD Risk Factors Analysis
The classification problem is one of the important research subjects in the field of machine learning. However, most machine learning algorithms train a classifier based on the assumption that the number of training examples of classes is almost equal. When a classifier was trained on imbalanced dat...
Saved in:
Published in: | The open electrical and electronic engineering journal 2013-06, Vol.7 (1), p.62-70 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The classification problem is one of the important research subjects in the field of machine learning. However,
most machine learning algorithms train a classifier based on the assumption that the number of training examples of
classes is almost equal. When a classifier was trained on imbalanced data, the performance of the classifier declined
clearly. For resolving the class-imbalanced problem, an improved random forest algorithm was proposed based on sampling
with replacement. We extracted multiple example subsets randomly with replacement from majority class, and the
example number of extracted example subsets is as the same with minority class example dataset. Then, multiple new training
datasets were constructed by combining the each exacted majority example subset and minority class dataset respectively,
and multiple random forest classifiers were training on these training dataset. For a prediction example, the class was
determined by majority voting of multiple random forest classifiers. The experimental results on five groups UCI datasets
and a real clinical dataset show that the proposed method could deal with the class-imbalanced data problem and the improved
random forest algorithm outperformed original random forest and other methods in literatures. |
---|---|
ISSN: | 1874-1290 1874-1290 |
DOI: | 10.2174/1874129001307010062 |