Loading…
Predicting Faults in High Assurance Software
Reducing the number of latent software defects is a development goal that is particularly applicable to high assurance software systems. For such systems, the software measurement and defect data is highly skewed toward the not-fault-prone program modules, i.e., the number of fault-prone modules is...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Reducing the number of latent software defects is a development goal that is particularly applicable to high assurance software systems. For such systems, the software measurement and defect data is highly skewed toward the not-fault-prone program modules, i.e., the number of fault-prone modules is relatively very small. The skewed data problem, also known as class imbalance, poses a unique challenge when training a software quality estimation model. However, practitioners and researchers often build defect prediction models without regard to the skewed data problem. In high assurance systems, the class imbalance problem must be addressed when building defect predictors. This study investigates the roughly balanced bagging (RBBag) algorithm for building software quality models with data sets that suffer from class imbalance. The algorithm combines bagging and data sampling into one technique. A case study of 15 software measurement data sets from different real-world high assurance systems is used in our investigation of the RBBag algorithm. Two commonly used classification algorithms in the software engineering domain, Naive Bayes and C4.5 decision tree, are combined with RBBag for building the software quality models. The results demonstrate that defect prediction models based on the RBBag algorithm significantly outperform models built without any bagging or data sampling. The RBBag algorithm provides the analyst with a tool for effectively addressing class imbalance when training defect predictors during high assurance software development. |
---|---|
ISSN: | 1530-2059 2640-7507 |
DOI: | 10.1109/HASE.2010.29 |