Loading…
TSTSS: A two-stage training subset selection framework for cross version defect prediction
•Proposes a novel method to select an optimal subset of modules to train cross version defect prediction models.•Relies on a weighted based classifier to build the classification model.•Conducts a large-scale empirical study, achieving encouraging performance. Cross Version Defect Prediction (CVDP)...
Saved in:
Published in: | The Journal of systems and software 2019-08, Vol.154, p.59-78 |
---|---|
Main Authors: | , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Proposes a novel method to select an optimal subset of modules to train cross version defect prediction models.•Relies on a weighted based classifier to build the classification model.•Conducts a large-scale empirical study, achieving encouraging performance.
Cross Version Defect Prediction (CVDP) is a practical scenario by training the classification model on the historical data of the prior version and then predicting the defect labels of modules in the current version. Unfortunately, the differences of data distribution across versions may hinder the effectiveness of the trained CVDP model. Thus, it is not trivial to select a suitable training subset from the prior version to promote the CVDP performance. In this paper, we propose a novel method, called Two-Stage Training Subset Selection (TSTSS), to address this challenging issue. In the first stage, TSTSS utilizes a sparse modeling representative selection method to select an initial module subset from the prior version which can well reconstruct the data of the prior version. In the second stage, TSTSS leverages a dissimilarity-based sparse subset selection method to further refine the selected module subset, which enables the selected modules to well represent the modules of the current version. Finally, we use a novel weighted extreme learning machine classifier to construct the CVDP model. We evaluate the CVDP performance of TSTSS on 50 cross-version pairs using 6 indicators. The experiments show that TSTSS can efficiently improve the CVDP performance compared with 11 baseline methods. |
---|---|
ISSN: | 0164-1212 1873-1228 |
DOI: | 10.1016/j.jss.2019.03.027 |