Loading…

Using Coding-Based Ensemble Learning to Improve Software Defect Prediction

Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem....

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on human-machine systems 2012-11, Vol.42 (6), p.1806-1817
Main Authors: Sun, Zhongbin, Song, Qinbao, Zhu, Xiaoyan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43
cites cdi_FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43
container_end_page 1817
container_issue 6
container_start_page 1806
container_title IEEE transactions on human-machine systems
container_volume 42
creator Sun, Zhongbin
Song, Qinbao
Zhu, Xiaoyan
description Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods.
doi_str_mv 10.1109/TSMCC.2012.2226152
format article
fullrecord <record><control><sourceid>proquest_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_26818943</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6392473</ieee_id><sourcerecordid>2851366401</sourcerecordid><originalsourceid>FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43</originalsourceid><addsrcrecordid>eNpdkE1Lw0AQhoMoWKt_QC8BEbyk7nd2jxqrVioKbc_LZjMrKfmou6nivzexxYOnGZjnfRmeKDrHaIIxUjfLxUuWTQjCZEIIEZiTg2iEOZcJYYwc9jtSLBEqTY-jkxDWCGHGFB1Fz6tQNu9x1hb9SO5MgCKeNgHqvIJ4DsY3w7lr41m98e0nxIvWdV_GQ3wPDmwXv3koStuVbXMaHTlTBTjbz3G0epgus6dk_vo4y27niaVcdomDPM8lxlQAc6wwNreKSZRTCcIwlDNlBXK5koUgqRSGc5KqQjjgTgIuGB1H17ve_qGPLYRO12WwUFWmgXYbNKaYi5RQmvbo5T903W5903-ncS9GMInRUEh2lPVtCB6c3viyNv5bY6QHvfpXrx706r3ePnS1rzbBmsp509gy_CWJkFgqRnvuYseVAPB3FlQRllL6A4q8glc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1244648104</pqid></control><display><type>article</type><title>Using Coding-Based Ensemble Learning to Improve Software Defect Prediction</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Sun, Zhongbin ; Song, Qinbao ; Zhu, Xiaoyan</creator><creatorcontrib>Sun, Zhongbin ; Song, Qinbao ; Zhu, Xiaoyan</creatorcontrib><description>Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods.</description><identifier>ISSN: 1094-6977</identifier><identifier>ISSN: 2168-2291</identifier><identifier>EISSN: 1558-2442</identifier><identifier>EISSN: 2168-2305</identifier><identifier>DOI: 10.1109/TSMCC.2012.2226152</identifier><identifier>CODEN: ITCRFH</identifier><language>eng</language><publisher>New-York, NY: IEEE</publisher><subject>Algorithmics. Computability. Computer arithmetics ; Algorithms ; Applied sciences ; Artificial intelligence ; Boosting ; Class-imbalance data ; Classification ; Coding ; Computer programs ; Computer science; control theory; systems ; Computer systems performance. Reliability ; Data processing. List processing. Character string processing ; Defects ; Encoding ; Exact sciences and technology ; Learning ; Memory organisation. Data processing ; meta learning ; multiclassifier ; Prediction algorithms ; Predictive models ; Sampling ; Software ; Software algorithms ; software defect prediction ; Software defects ; Studies ; Theoretical computing</subject><ispartof>IEEE transactions on human-machine systems, 2012-11, Vol.42 (6), p.1806-1817</ispartof><rights>2014 INIST-CNRS</rights><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Nov 2012</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43</citedby><cites>FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6392473$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,54771</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=26818943$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Sun, Zhongbin</creatorcontrib><creatorcontrib>Song, Qinbao</creatorcontrib><creatorcontrib>Zhu, Xiaoyan</creatorcontrib><title>Using Coding-Based Ensemble Learning to Improve Software Defect Prediction</title><title>IEEE transactions on human-machine systems</title><addtitle>TSMCC</addtitle><description>Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods.</description><subject>Algorithmics. Computability. Computer arithmetics</subject><subject>Algorithms</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Boosting</subject><subject>Class-imbalance data</subject><subject>Classification</subject><subject>Coding</subject><subject>Computer programs</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems performance. Reliability</subject><subject>Data processing. List processing. Character string processing</subject><subject>Defects</subject><subject>Encoding</subject><subject>Exact sciences and technology</subject><subject>Learning</subject><subject>Memory organisation. Data processing</subject><subject>meta learning</subject><subject>multiclassifier</subject><subject>Prediction algorithms</subject><subject>Predictive models</subject><subject>Sampling</subject><subject>Software</subject><subject>Software algorithms</subject><subject>software defect prediction</subject><subject>Software defects</subject><subject>Studies</subject><subject>Theoretical computing</subject><issn>1094-6977</issn><issn>2168-2291</issn><issn>1558-2442</issn><issn>2168-2305</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNpdkE1Lw0AQhoMoWKt_QC8BEbyk7nd2jxqrVioKbc_LZjMrKfmou6nivzexxYOnGZjnfRmeKDrHaIIxUjfLxUuWTQjCZEIIEZiTg2iEOZcJYYwc9jtSLBEqTY-jkxDWCGHGFB1Fz6tQNu9x1hb9SO5MgCKeNgHqvIJ4DsY3w7lr41m98e0nxIvWdV_GQ3wPDmwXv3koStuVbXMaHTlTBTjbz3G0epgus6dk_vo4y27niaVcdomDPM8lxlQAc6wwNreKSZRTCcIwlDNlBXK5koUgqRSGc5KqQjjgTgIuGB1H17ve_qGPLYRO12WwUFWmgXYbNKaYi5RQmvbo5T903W5903-ncS9GMInRUEh2lPVtCB6c3viyNv5bY6QHvfpXrx706r3ePnS1rzbBmsp509gy_CWJkFgqRnvuYseVAPB3FlQRllL6A4q8glc</recordid><startdate>20121101</startdate><enddate>20121101</enddate><creator>Sun, Zhongbin</creator><creator>Song, Qinbao</creator><creator>Zhu, Xiaoyan</creator><general>IEEE</general><general>Institute of Electrical and Electronics Engineers</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope></search><sort><creationdate>20121101</creationdate><title>Using Coding-Based Ensemble Learning to Improve Software Defect Prediction</title><author>Sun, Zhongbin ; Song, Qinbao ; Zhu, Xiaoyan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Algorithmics. Computability. Computer arithmetics</topic><topic>Algorithms</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Boosting</topic><topic>Class-imbalance data</topic><topic>Classification</topic><topic>Coding</topic><topic>Computer programs</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems performance. Reliability</topic><topic>Data processing. List processing. Character string processing</topic><topic>Defects</topic><topic>Encoding</topic><topic>Exact sciences and technology</topic><topic>Learning</topic><topic>Memory organisation. Data processing</topic><topic>meta learning</topic><topic>multiclassifier</topic><topic>Prediction algorithms</topic><topic>Predictive models</topic><topic>Sampling</topic><topic>Software</topic><topic>Software algorithms</topic><topic>software defect prediction</topic><topic>Software defects</topic><topic>Studies</topic><topic>Theoretical computing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Zhongbin</creatorcontrib><creatorcontrib>Song, Qinbao</creatorcontrib><creatorcontrib>Zhu, Xiaoyan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><jtitle>IEEE transactions on human-machine systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Zhongbin</au><au>Song, Qinbao</au><au>Zhu, Xiaoyan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Using Coding-Based Ensemble Learning to Improve Software Defect Prediction</atitle><jtitle>IEEE transactions on human-machine systems</jtitle><stitle>TSMCC</stitle><date>2012-11-01</date><risdate>2012</risdate><volume>42</volume><issue>6</issue><spage>1806</spage><epage>1817</epage><pages>1806-1817</pages><issn>1094-6977</issn><issn>2168-2291</issn><eissn>1558-2442</eissn><eissn>2168-2305</eissn><coden>ITCRFH</coden><abstract>Using classification methods to predict software defect proneness with static code attributes has attracted a great deal of attention. The class-imbalance characteristic of software defect data makes the prediction much difficult; thus, a number of methods have been employed to address this problem. However, these conventional methods, such as sampling, cost-sensitive learning, Bagging, and Boosting, could suffer from the loss of important information, unexpected mistakes, and overfitting because they alter the original data distribution. This paper presents a novel method that first converts the imbalanced binary-class data into balanced multiclass data and then builds a defect predictor on the multiclass data with a specific coding scheme. A thorough experiment with four different types of classification algorithms, three data coding schemes, and six conventional imbalance data-handling methods was conducted over the 14 NASA datasets. The experimental results show that the proposed method with a one-against-one coding scheme is averagely superior to the conventional methods.</abstract><cop>New-York, NY</cop><pub>IEEE</pub><doi>10.1109/TSMCC.2012.2226152</doi><tpages>12</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1094-6977
ispartof IEEE transactions on human-machine systems, 2012-11, Vol.42 (6), p.1806-1817
issn 1094-6977
2168-2291
1558-2442
2168-2305
language eng
recordid cdi_pascalfrancis_primary_26818943
source IEEE Electronic Library (IEL) Journals
subjects Algorithmics. Computability. Computer arithmetics
Algorithms
Applied sciences
Artificial intelligence
Boosting
Class-imbalance data
Classification
Coding
Computer programs
Computer science
control theory
systems
Computer systems performance. Reliability
Data processing. List processing. Character string processing
Defects
Encoding
Exact sciences and technology
Learning
Memory organisation. Data processing
meta learning
multiclassifier
Prediction algorithms
Predictive models
Sampling
Software
Software algorithms
software defect prediction
Software defects
Studies
Theoretical computing
title Using Coding-Based Ensemble Learning to Improve Software Defect Prediction
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T04%3A44%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Using%20Coding-Based%20Ensemble%20Learning%20to%20Improve%20Software%20Defect%20Prediction&rft.jtitle=IEEE%20transactions%20on%20human-machine%20systems&rft.au=Sun,%20Zhongbin&rft.date=2012-11-01&rft.volume=42&rft.issue=6&rft.spage=1806&rft.epage=1817&rft.pages=1806-1817&rft.issn=1094-6977&rft.eissn=1558-2442&rft.coden=ITCRFH&rft_id=info:doi/10.1109/TSMCC.2012.2226152&rft_dat=%3Cproquest_pasca%3E2851366401%3C/proquest_pasca%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c358t-febbb81136e4f4dacbc9480b38e6a40b49c60fb98d62786a55279d6fe5f8e1d43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1244648104&rft_id=info:pmid/&rft_ieee_id=6392473&rfr_iscdi=true