Loading…

Using Coding-Based Ensemble Learning to Improve Software Defect Prediction

Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem....

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on human-machine systems 2012-11, Vol.42 (6), p.1806-1817
Main Authors: Sun, Zhongbin, Song, Qinbao, Zhu, Xiaoyan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods.
ISSN:1094-6977
2168-2291
1558-2442
2168-2305
DOI:10.1109/TSMCC.2012.2226152