Loading…

TLEL: A two-layer ensemble learning approach for just-in-time defect prediction

•We propose a novel approach TLEL, which can be seen as a two-layer ensemble learning technique, to achieve a better performance for just-in-time defect prediction problem.•We compare TLEL with three baselines, i.e., Deeper, DNC and MKEL, on six large software projects.•The experiment results show t...

Full description

Saved in:
Bibliographic Details
Published in:Information and software technology 2017-07, Vol.87, p.206-220
Main Authors: Yang, Xinli, Lo, David, Xia, Xin, Sun, Jianling
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•We propose a novel approach TLEL, which can be seen as a two-layer ensemble learning technique, to achieve a better performance for just-in-time defect prediction problem.•We compare TLEL with three baselines, i.e., Deeper, DNC and MKEL, on six large software projects.•The experiment results show that our approach can achieve a substantial improvement over all of them. Moreover, TLEL could discover over 70% reviewing only 20% of the lines of code. [Display omitted] Defect prediction is a very meaningful topic, particularly at change-level. Change-level defect prediction, which is also referred as just-in-time defect prediction, could not only ensure software quality in the development process, but also make the developers check and fix the defects in time [1]. Ensemble learning becomes a hot topic in recent years. There have been several studies about applying ensemble learning to defect prediction [2–5]. Traditional ensemble learning approaches only have one layer, i.e., they use ensemble learning once. There are few studies that leverages ensemble learning twice or more. To bridge this research gap, we try to hybridize various ensemble learning methods to see if it will improve the performance of just-in-time defect prediction. In particular, we focus on one way to do this by hybridizing bagging and stacking together and leave other possibly hybridization strategies for future work. In this paper, we propose a two-layer ensemble learning approach TLEL which leverages decision tree and ensemble learning to improve the performance of just-in-time defect prediction. In the inner layer, we combine decision tree and bagging to build a Random Forest model. In the outer layer, we use random under-sampling to train many different Random Forest models and use stacking to ensemble them once more. To evaluate the performance of TLEL, we use two metrics, i.e., cost effectiveness and F1-score. We perform experiments on the datasets from six large open source projects, i.e., Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL, containing a total of 137,417 changes. Also, we compare our approach with three baselines, i.e., Deeper, the approach proposed by us [6], DNC, the approach proposed by Wang et al. [2], and MKEL, the approach proposed by Wang et al. [3]. The experimental results show that on average across the six datasets, TLEL could discover over 70% of the bugs by reviewing only 20% of the lines of code, as compared with about 50% for the baselines. In
ISSN:0950-5849
1873-6025
DOI:10.1016/j.infsof.2017.03.007